HVT: An Introduction

Zubin Dowlaty, Shubhra Prakash, Sangeet Moy Das, Praditi Shah, Shantanu Vaidya, Somya Shambhawi

2024-02-20

1. Abstract

The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:

  1. Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.

  2. Data Projection: Dimension projection of the compressed cells to 1D,2D or Interactive surface plot with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called an embedding) coordinates into the desired output dimension.

  3. Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map useful for semi-supervised tasks.

  4. Scoring: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.

2. Data Compression

Compression is a technique used to reduce the data size while preserving its essential information, allowing for efficient storage and decompression to reconstruct the original data. While Vector quantization (VQ) is a technique used in data compression to represent a set of data points with a smaller number of representative vectors. It achieves compression by exploiting redundancies or patterns in the data and replacing similar data points with representative vectors.

This package offers several advantages for performing data compression as it is designed to handle high-dimensional data more efficiently. It provides a hierarchical compression approach, allowing multi-resolution representation of the data. The hierarchical structure enables efficient compression and storage of the data while preserving different levels of detail. HVT aims to preserve the topological structure of the data during compression.Spatial data with irregular shapes and complex structures in high-dimensional data can contain valuable information about relationships and patterns. HVT seeks to capture and retain these topological characteristics, enabling meaningful analysis and visualization.This package employs tessellation to divide the compressed data space into distinct cells or regions while preserving the topology of the original data. This means that the relationships and connectivity between data points are maintained in the compressed representation.

This package can perform vector quantization using the following algorithms-

2.1 Hierarchical Vector Quantization

2.1.1 Using k-means

  1. The k-means algorithm randomly selects k data points as initial means.
  2. k clusters are formed by assigning each data point to its closest cluster mean using the Euclidean distance.
  3. Virtual means for each cluster are calculated by using all datapoints contained in a cluster.

The second and third steps are iterated until a predefined number of iterations is reached or the clusters converge. The runtime for the algorithm is O(n).

2.1.2 Using k-medoids

  1. The k-medoids algorithm randomly selects k data points as initial means out of the n data points as the medoids.
  2. k clusters are formed by assigning each data point to its closest medoid by using any common distance metric methods.
  3. Virtual means for each cluster are calculated by using all datapoints contained in a cluster.

The second and third steps are iterated until a predefined number of iterations is reached or the clusters converge. The runtime for the algorithm is O(k * (n-k)^2).

These algorithm divides the dataset recursively into cells using \(k-means\) or \(k-medoids\) algorithm. The maximum number of subsets are decided by setting \(n_cells\) to, say five, in order to divide the dataset into maximum of five subsets. These five subsets are further divided into five subsets(or less), resulting in a total of twenty five (5*5) subsets. The recursion terminates when the cells either contain less than three data point or a stop criterion is reached. In this case, the stop criterion is set to when the cell error exceeds the quantization threshold.

The steps for this method are as follows:

  1. Select k(number of cells), depth and quantization error threshold.
  2. Perform quantization (using \(k-means\) or \(k-medoids\)) on the input dataset.
  3. Calculate quantization error for each of the k cells.
  4. Compare the quantization error for each cell to quantization error threshold.
  5. Repeat steps 2 to 4 for each of the k cells whose quantization error is above threshold until stop criterion is reached.

The stop criterion is when the quantization error of a cell satisfies one of the below conditions:

  • reaches below quantization error threshold.
  • there are less than three data points in the cell.
  • the user specified depth has been attained.

The quantization error for a cell is defined as follows:

\[QE = \max_i(||A-F_i||_{p})\]

where

  • \(A\) is the centroid of the cell
  • \(F_i\) represents a data point in the cell
  • \(m\) is the number of points in the cell
  • \(p\) is the \(p\)-norm metric. Here \(p\) = 1 represents L1 Norm and \(p\) = 2 represents L2 Norm

2.1.3 Quantization Error

Let us try to understand quantization error with an example.

Figure 1: The Voronoi tessellation for level 1 shown for the 5 cells with the points overlayed

Figure 1: The Voronoi tessellation for level 1 shown for the 5 cells with the points overlayed

An example of a 2 dimensional VQ is shown above.

In the above image, we can see 5 cells with each cell containing a certain number of points. The centroid for each cell is shown in blue. These centroids are also known as codewords since they represent all the points in that cell. The set of all codewords is called a codebook.

Now we want to calculate quantization error for each cell. For the sake of simplicity, let’s consider only one cell having centroid A and m data points \(F_i\) for calculating quantization error.

For each point, we calculate the distance between the point and the centroid.

\[ d = ||A - F_i||_{p} \]

In the above equation, p = 1 means L1_Norm distance whereas p = 2 means L2_Norm distance. In the package, the L1_Norm distance is chosen by default. The user can pass either L1_Norm, L2_Norm or a custom function to calculate the distance between two points in n dimensions.

\[QE = \max_i(||A-F_i||_{p})\]

Now, we take the maximum calculated distance of all m points. This gives us the furthest distance of a point in the cell from the centroid, which we refer to as Quantization Error. If the Quantization Error is higher than the given threshold, the centroid/ codevector is not a good representation for the points in the cell. Now we can perform further Vector Quantization on these points and repeat the above steps.

Please note that the user can select mean, max or any custom function to calculate the Quantization Error. The custom function takes a vector of m value (where each value is a distance between point in n dimensions and centroids) and returns a single value which is the Quantization Error for the cell.

If we select mean as the error metric, the above Quantization Error equation will look like this:

\[QE = \frac{1}{m}\sum_{i=1}^m||A-F_i||_{p}\]

3. Data Projection

Projection mainly involves converting data from its original form to a different space or coordinate system while preserving certain properties of it. By projecting data into a common coordinate system, spatial relationships, distances, areas, and other spatial attributes can be accurately measured and compared.

HVT performs projection as part of its workflow to visualize and explore high-dimensional data. The projection step in HVT involves mapping the compressed data, represented by the hierarchical structure of cells, onto a lower-dimensional space for visualization purposes, as human perception is more suited to interpreting information in lower-dimensional spaces.Users can zoom in/out, rotate, and explore different regions of the projected space to gain insights and understand the data from different perspectives.

Sammon’s projection is an algorithm used in this package to map a high-dimensional space to a space of lower dimensionality while attempting to preserve the structure of inter-point distances in the projection. It is particularly suited for use in exploratory data analysis and is usually considered a non-linear approach since the mapping cannot be represented as a linear combination of the original variables. The centroids are plotted in 2D after performing Sammon’s projection at every level of the tessellation.

Denoting the distance between \(i^{th}\) and \(j^{th}\) objects in the original space by \(d_{ij}^*\), and the distance between their projections by \(d_{ij}\). Sammon’s mapping aims to minimize the below error function, which is often referred to as Sammon’s stress or Sammon’s error.

\[E=\frac{1}{\sum_{i<j} d_{ij}^*}\sum_{i<j}\frac{(d_{ij}^*-d_{ij})^2}{d_{ij}^*}\]

The minimization of this can be performed either by gradient descent, as proposed initially, or by other means, usually involving iterative methods. The number of iterations need to be experimentally determined and convergent solutions are not always guaranteed. Many implementations prefer to use the first Principal Components as a starting configuration.

4. Tessellation

A Voronoi diagram is a way of dividing space into a number of regions. A set of points (called seeds, sites, or generators) is specified beforehand and for each seed, there will be a corresponding region consisting of all points within proximity of that seed. These regions are called Voronoi cells. It is complementary to Delaunay triangulation is a geometrical algorithm used to create a triangulated mesh from a set of points in a plane which has the property that no data point lies within the circumcircle of any triangle in the triangulation. This property guarantees that the resulting cells in the tessellation do not overlap with each other.

By using Delaunay triangulation, HVT can achieve a partitioning of the data space into distinct and non-overlapping regions, which is crucial for accurately representing and analyzing the compressed data.Additionally, the use of Delaunay triangulation for tessellation ensures that the resulting cells have well-defined shapes, typically triangles in two dimensions or tetrahedra in three dimensions.

The hierarchical structure resulting from tessellation preserves the inherent structure and relationships within the data. It captures clusters, subclusters, and other patterns in the data, allowing for a more organized and interpretable representation. The hierarchical structure reduces redundancy and enables more compact representations.

Tessellate: Constructing Voronoi Tesselation

In this package, we use sammons from the package MASS to project higher dimensional data to a 2D space. The function hvq called from the trainHVT function returns hierarchical quantized data which will be the input for construction of the tessellations. The data is then represented in 2D coordinates and the tessellations are plotted using these coordinates as centroids. We use the package deldir for this purpose. The deldir package computes the Delaunay triangulation (and hence the Dirichlet or Voronoi tessellation) of a planar point set according to the second (iterative) algorithm of Lee and Schacter. For subsequent levels, transformation is performed on the 2D coordinates to get all the points within its parent tile. Tessellations are plotted using these transformed points as centroids. The lines in the tessellations are chopped in places so that they do not protrude outside the parent polygon. This is done for all the subsequent levels.

5. Scoring

Scoring basically refers to the process of chalking up or estimating future values or outcomes based on existing data patterns.In training process, a model is developed based on historical data or a training dataset, and this model is then used to score new, unseen data. The model captures the underlying patterns, trends, and relationships present in the training data, allowing it to pin point the cell of the similar or related data points.

In this package, we use scoreHVT function to score each point in the test dataset.

Scoring Algorithm

The Scoring algorithm recursively calculates the distance between each point in the test dataset and the cell centroids for each level. The following steps explain the scoring method for a single point in the test dataset:

  1. Calculate the distance between the point and the centroid of all the cells in the first level.
  2. Find the cell whose centroid has minimum distance to the point.
  3. Check if the cell drills down further to form more cells.
  4. If it doesn’t, return the path. Or else repeat steps 1 to 4 till we reach a level at which the cell doesn’t drill down further.

6. Example I: HVT with the Torus dataset

6.1 Import Dataset

6.1.1 Import Dataset from Local

The user can provide an absolute or relative path in the cell below to access the data from his/her computer. User can set import_data_from_local variable to TRUE to upload dataset from local.
Note: For this example import_data_from_local has been set to FALSE as we are simulating a dataset in next section.

import_data_from_local = FALSE ## set to TRUE to load dataset from local

  file_name <- "Computers.csv" 
  file_path <- "https://raw.githubusercontent.com/Mu-Sigma/HVT/master/vignettes/sample_dataset/"

# Loading the data in the Rstudio environment 
# Please change the path in the code line below to the path location of the .csv file
if(import_data_from_local){
  file_load <- paste0(file_path, file_name)
  dataset_updated <- as.data.frame(fread(file_load))
  if(nrow(dataset_updated) > 0){
    paste0("File ", file_name, " having ", nrow(dataset_updated), " row(s) and ", ncol(dataset_updated), " column(s)",  " imported successfully. ") %>% cat("\n")
    # Round only the numeric columns in dataset
    dataset_updated <- dataset_updated %>% mutate_if(is.numeric, round, digits = 4)
    paste0("Code chunk executed successfully. Below table showing first 10 row(s) of the dataset.") %>% cat("\n")
    # Display imported dataset
    dataset_updated %>% head(10) %>% 
      as.data.frame() %>%
      DT::datatable(options = options, rownames = TRUE)
  }

# when a csv file is read, it creates a row number as a column, hence remove it for further processing
dataset_updated <- dataset_updated[,-1]  
} 

6.1.2 Simulate Dataset

In this section, we will use a simulated dataset. Given below is a simulated dataset called torus that contains 12000 observations and 3 features.

Let us see how to generate data for torus. We are using a library geozoo for this purpose. Geo Zoo (stands for Geometric Zoo) is a compilation of geometric objects ranging from 3 to 10 dimensions. Geo Zoo contains regular or well-known objects, eg cube and sphere, and some abstract objects, e.g. Boy’s surface, Torus and Hyper-Torus.

Here, we load the data and store into a variable dataset_updated.

set.seed(257)
##torus data generation
torus <- geozoo::torus(p = 3,n = 12000)
dataset_updated <- data.frame(torus$points)
colnames(dataset_updated) <- c("x","y","z")
dataset_updated <- round(dataset_updated, 4)

  if(nrow(dataset_updated) > 0){
paste0( "Dataset having ", nrow(dataset_updated), " row(s) and ", ncol(dataset_updated), " column(s)",  "simulated successfully. ") %>% cat("\n")
    # Round only the numeric columns in dataset
    dataset_updated <- dataset_updated %>% mutate_if(is.numeric, round, digits = 4)
    paste0("Code chunk executed successfully. The table below is showing first 20 row(s) of the dataset.") %>% cat("\n")
    # Display imported dataset
    dataset_updated %>% head(100) %>%
      as.data.frame() %>%
      Table(scroll = TRUE, limit = 20)
  }
Dataset having 12000 row(s) and 3 column(s)simulated successfully.  
Code chunk executed successfully. The table below is showing first 20 row(s) of the dataset. 
x y z
-1.0020 -2.3335 -0.8420
1.1021 -2.7447 -0.2878
-1.0033 1.2656 -0.9229
1.3204 0.6205 -0.8410
1.2998 -1.2470 -0.9801
-1.9606 2.0755 -0.5184
0.2807 -0.9724 0.1548
-1.5540 2.0564 -0.8164
-2.4653 1.6586 -0.2377
-2.3234 1.6933 -0.4841
0.2474 2.6207 -0.7747
-1.8834 1.0728 -0.9859
-0.2908 2.7138 -0.6841
-0.0234 -1.0422 0.2884
0.8715 -0.8637 -0.6343
-0.7872 -1.3636 -0.9049
0.5707 1.0214 -0.5578
-2.1770 1.5699 -0.7295
2.0941 -1.8907 -0.5704
-0.1980 2.9839 -0.1378

Now let’s try to visualize the torus (donut) in 3D Space.

knitr::include_graphics('torus_donut.png')
Figure 2: 3D Torus

Figure 2: 3D Torus


6.2 Data Understanding

6.2.1 Quick Peek of the Data

Summary of Dataset

The table below shows summary of all (numeric & categorical) columns of Dataset

calculate_statistics <- function(column_data) {
     if (is.numeric(column_data)) {
         a <- min(column_data, na.rm = TRUE)
         b <- as.numeric(quantile(column_data, probs = 0.25, na.rm = TRUE)[1])
         c <- median(column_data, na.rm = TRUE)
         d <- mean(column_data, na.rm = TRUE)
         e <- sd(column_data, na.rm = TRUE)
         f <- as.numeric(quantile(column_data, probs = 0.75, na.rm = TRUE)[1])
         g <- max(column_data, na.rm = TRUE)
         
         # Combine the statistics into a data frame and set row names to an empty string
         stats_data <- data.frame(Min = a, Q1 = b, Median = c, Mean = d, sd = e, Q3 = f, Max = g)
         row.names(stats_data) <- ""
     } else {
         cat("Column is not numeric and was skipped.\n")
         return(NULL)
     }
     return(stats_data)
 }

# Apply the function to each column of trainTorus_data
statistics_list <- lapply(dataset_updated, calculate_statistics)

# Print the result
print(statistics_list)
$x
     Min        Q1  Median         Mean      sd       Q3    Max
 -2.9991 -1.132525 0.00855 -0.003785033 1.49639 1.121475 2.9993

$y
     Min        Q1   Median        Mean       sd       Q3    Max
 -2.9999 -1.110575 -0.02285 -0.01101362 1.487875 1.106025 2.9997

$z
 Min        Q1   Median        Mean        sd       Q3 Max
  -1 -0.716225 -0.01775 -0.01075775 0.7064376 0.697175   1

Structure of Data

In the below section we can see the structure of the data.

dataset_updated %>% str()
'data.frame':   12000 obs. of  3 variables:
 $ x: num  -1 1.1 -1 1.32 1.3 ...
 $ y: num  -2.333 -2.745 1.266 0.621 -1.247 ...
 $ z: num  -0.842 -0.288 -0.923 -0.841 -0.98 ...

6.2.2 Deleting Irrelevant Columns

The cell below will allow user to drop irrelevant column.

########################################################################################
################################## User Input Needed ###################################
########################################################################################

# Add column names which you want to remove
want_to_delete_column <- "no"

    del_col<-c(" `column_name` ")  

if(want_to_delete_column == "yes"){
   dataset_updated <-  dataset_updated[ , !(names(dataset_updated) %in% del_col)]
  print("Code chunk executed successfully. Overview of data types after removed selected columns")
  str( dataset_updated)
}else{
  paste0("No Columns removed. Please enter column name if you want to remove that column") %>% cat("\n")
}
No Columns removed. Please enter column name if you want to remove that column 

6.2.3 Formatting and Renaming Columns

The code below contains a user defined function to rename or reformat any column that the user chooses.

########################################################################################
################################## User Input Needed ###################################
########################################################################################

# convert the column names to lower case
colnames( dataset_updated) <- colnames( dataset_updated) %>% casefold()

## rename column ?
want_to_rename_column <- "no" ## type "yes" if you want to rename a column

## renaming a column of a dataset 
rename_col_name <- " 'column_name` " ## use small letters
rename_col_name_to <- " `new_name` "

if(want_to_rename_column == "yes"){
  names( dataset_updated)[names( dataset_updated) == rename_col_name] <- rename_col_name_to
}

# remove space, comma, dot from column names
spaceless <- function(x) {colnames(x) <- gsub(pattern = "[^[:alnum:]]+",
                             replacement = ".",
                             names(x));x}
 dataset_updated <- spaceless( dataset_updated)

## below is the dataset summary
paste0("Successfully converted the column names to lower case and check the renamed column name if you changed") %>% cat("\n")
Successfully converted the column names to lower case and check the renamed column name if you changed 
str( dataset_updated) ## showing summary for updated 
'data.frame':   12000 obs. of  3 variables:
 $ x: num  -1 1.1 -1 1.32 1.3 ...
 $ y: num  -2.333 -2.745 1.266 0.621 -1.247 ...
 $ z: num  -0.842 -0.288 -0.923 -0.841 -0.98 ...

6.2.4 Changing Data Type of Columns

The section allows the user to change the data type of columns of his/her choice.

########################################################################################
################################## User Input Needed ###################################
########################################################################################

# If you want to change column type, change a below variable value to "yes"
want_to_change_column_type <- "no"

# you can change column type into numeric or character only
change_column_to_type <- "character" ## numeric

if(want_to_change_column_type == "yes" && change_column_to_type == "character"){
########################################################################################
################################## User Input Needed ###################################
########################################################################################
  select_columns <- c("panel_var") ###### Add column names you want to change here #####
   dataset_updated[select_columns]<- sapply( dataset_updated[select_columns],as.character)
  paste0("Code chunk executed successfully. Datatype of selected column(s) have been changed into numerical.")
  #str( dataset_updated)
}else if(want_to_change_column_type == "yes" && change_column_to_type == "numeric"){
  select_columns <- c('gearbox_oil_temperature')
   dataset_updated[select_columns]<- sapply( dataset_updated[select_columns],as.numeric)
  paste0("Code chunk executed successfully. Datatype of selected column(s) have been changed into categorical.")
  #str( dataset_updated)
}else{
  paste0("Datatype of columns have not been changed.") %>% cat("\n")
}
Datatype of columns have not been changed. 
dataset_updated <- do.call(data.frame, dataset_updated)
str( dataset_updated)
'data.frame':   12000 obs. of  3 variables:
 $ x: num  -1 1.1 -1 1.32 1.3 ...
 $ y: num  -2.333 -2.745 1.266 0.621 -1.247 ...
 $ z: num  -0.842 -0.288 -0.923 -0.841 -0.98 ...

6.2.5 Checking and Removing Duplicates

Presence of duplicate observations can be misleading, this sections helps get rid of such rows in the dataset.

want_to_remove_duplicates <- "yes"  ## type "no" for choosing to not remove duplicates

## removing duplicate observation if present in the dataset
if(want_to_remove_duplicates == "yes"){
  
   dataset_updated <-  dataset_updated %>% unique()
  paste0("Code chunk executed successfully, duplicates if present successfully removed. Updated dataset has ", nrow( dataset_updated), " row(s) and ", ncol( dataset_updated), " column(s)") %>% print()
  cat("\n")
  str( dataset_updated) ## showing summary for updated dataset
} else{
  paste0("Code chunk executed successfully, NO duplicates were removed") %>% print()
}
[1] "Code chunk executed successfully, duplicates if present successfully removed. Updated dataset has 12000 row(s) and 3 column(s)"

'data.frame':   12000 obs. of  3 variables:
 $ x: num  -1 1.1 -1 1.32 1.3 ...
 $ y: num  -2.333 -2.745 1.266 0.621 -1.247 ...
 $ z: num  -0.842 -0.288 -0.923 -0.841 -0.98 ...

6.2.6 List of Numerical and Categorical Column Names

# Return the column type 
CheckColumnType <- function(dataVector) {
  #Check if the column type is "numeric" or "character" & decide type accordingly
  if (class(dataVector) == "integer" || class(dataVector) == "numeric") {
    columnType <- "numeric"
  } else { columnType <- "character" }
  #Return the result
  return(columnType)
}
### Loading the list of numeric columns in variable
numeric_cols <<- colnames( dataset_updated)[unlist(sapply( dataset_updated, 
                                                       FUN = function(x){ CheckColumnType(x) == "numeric"}))]

### Loading the list of categorical columns in variable
cat_cols <- colnames( dataset_updated)[unlist(sapply( dataset_updated, 
                                                   FUN = function(x){ 
                                                     CheckColumnType(x) == "character"|| CheckColumnType(x) == "factor"}))]

### Removing Date Column from the list of categorical column
paste0("Code chunk executed successfully, list of numeric and categorical variables created.") %>% cat()
Code chunk executed successfully, list of numeric and categorical variables created.
paste0("\n\n Numerical Column(s): \n Count : ", length(numeric_cols), "\n") %>% cat()


 Numerical Column(s): 
 Count : 3
paste0(numeric_cols) %>% print()
[1] "x" "y" "z"
paste0("\n Categorical Column(s): \n Count : ", length(cat_cols), "\n") %>% cat()

 Categorical Column(s): 
 Count : 0
paste0(cat_cols) %>% print()
character(0)

6.2.7 Filtering Dataset for Analysis

In this section, the dataset can be filtered for required row(s) for further analysis.

want_to_filter_dataset <- "no" ## type "yes" in case you want to filter
filter_col <- " "  ## Enter Column name to filter
filter_val <- " "        ## Enter Value to exclude for the column selected

if(want_to_filter_dataset == "yes"){
   dataset_updated <- filter_at( dataset_updated
                              , vars(contains(filter_col))
                              , all_vars(. != filter_val))
  
  paste0("Code chunk executed successfully, dataset filtered successfully on required columns. Updated dataset has ", nrow( dataset_updated), " row(s) and ", ncol( dataset_updated), " column(s)") %>% print()
  cat("\n")
  str( dataset_updated) ## showing summary for updated dataset
  
} else{
  paste0("Code chunk executed successfully, entire dataset is available for analysis.") %>% print()
}
[1] "Code chunk executed successfully, entire dataset is available for analysis."

6.2.8 Missing Value Analysis

Missing values in the training data can lead to a biased model because we have not analyzed the behavior and relationship of those values with other variables correctly. It can lead to a wrong calculation or classification. Missing values can be of 3 types:

Missing Value on Entire dataset

na_total <- sum(is.na( dataset_updated))/prod(dim( dataset_updated))
if(na_total == 0){
  paste0("In the uploaded dataset, there is no missing value") %>% cat("\n")
}else{
  na_percentage <- paste0(sprintf(na_total*100, fmt = '%#.2f'),"%")
  paste0("Percentage of missing value in entire dataset is ",na_percentage) %>% cat("\n")
}
In the uploaded dataset, there is no missing value 

Missing Value on Column-level

The following code is to visualize the missing values (if any) using bar chart.

gg_miss_upset function are using to visualize the patterns of missingness, or rather the combinations of missingness across cases.

This function gives us(if any missing value present):

# Below code gives you missing value in each column
paste0("Number of missing value in each column") %>% cat("\n")
Number of missing value in each column 
print(sapply( dataset_updated, function(x) sum(is.na(x))))
x y z 
0 0 0 
missing_col_names <- names(which(sapply( dataset_updated, anyNA)))

total_na <- sum(is.na( dataset_updated))
# visualize the missing values (if any) using bar chart
if(total_na > 0 && length(missing_col_names) > 1){
  paste0("Code chunk executed successfully. Visualizing the missing values using bar chart") %>% cat("\n")
  gg_miss_upset( dataset_updated,
  nsets = 10,
  nintersects = NA)
}else if(total_na > 0){
   dataset_updated %>%
  DataExplorer::plot_missing() 
  # paste0("Code chunk executed successfully. Only one column ",missing_col_names," have missing values ", sum(is.na( dataset_updated)),".") %>% cat("\n")
}else{
  paste("Code chunk executed successfully. No missing value exist.") %>% cat("\n")
}
Code chunk executed successfully. No missing value exist. 

Missing Value Treatment

In this section user can make decisions, how to tackle missing values in dataset. Both column(s) and row(s) can be removed in the following dataset based on the user choose to do so.

Drop Column(s) with Missing Values

The below code accepts user input and deletes the specified column.

########################################################################################
################################## User Input Needed ###################################
########################################################################################

# OR do you want to drop column specific column
drop_cloumn_name_na <- "yes" ## type "yes" to drop column(s)
# write column name that you want to drop
drop_column_name <- c(" ") #enter column name
if(drop_cloumn_name_na == "yes"){
  names_df=names( dataset_updated) %in% drop_column_name
  dataset_updated <-  dataset_updated[ , which(!names( dataset_updated) %in% drop_column_name)]
  paste0("Code chunk executed, selected column(s) dropped successfully.") %>% print()
  cat("\n")
  str( dataset_updated)
} else {
  paste0("Code chunk executed, missing value not removed (if any).") %>% cat("\n")
  cat("\n")
}
[1] "Code chunk executed, selected column(s) dropped successfully."

'data.frame':   12000 obs. of  3 variables:
 $ x: num  -1 1.1 -1 1.32 1.3 ...
 $ y: num  -2.333 -2.745 1.266 0.621 -1.247 ...
 $ z: num  -0.842 -0.288 -0.923 -0.841 -0.98 ...

Drop Row(s) with Missing Values

The below code accepts user input and deletes rows.

# Do you want to drop row(s) containing "NA"
drop_row <- "no" ## type "yes" to delete missing value observations
if(drop_row == "yes"){
  
  # imputing blank with NAs and removing all rows containing NAs
  #  dataset_updated[ dataset_updated == ""] <- NA
  # removing missing values from data
   dataset_updated <-  dataset_updated %>% na.omit()
  
  paste0("Code chunk executed, missing values successfully identified and removed. Updated dataset has ", nrow( dataset_updated), " row(s) and ", ncol( dataset_updated), " column(s)") %>% print()
  cat("\n")
  # str( dataset_updated)
  
} else{
  paste0("Code chunk executed, missing value(s) not removed (if any).") %>% cat("\n")
  cat("\n")
}
Code chunk executed, missing value(s) not removed (if any). 

6.2.9 One-Hot Encoding

This technique bins all categorical values as either 1 or 0. It is used for categorical variables with 2 classes. This is done because classification models can only handle features that have numeric values.

Given below is the length of unique values in each categorical column

cat_cols <-
  colnames(dataset_updated)[unlist(sapply(
    dataset_updated,
    FUN = function(x) {
      CheckColumnType(x) == "character" ||
        CheckColumnType(x) == "factor"
    }
  ))]

apply(dataset_updated[cat_cols], 2, function(x) {
  length(unique(x))
})
integer(0)

Selecting categorical columns with smaller unique values for dummification

########################################################################################
################################## User Input Needed ###################################
########################################################################################
# Do you want to dummify the categorical variables?

dummify_cat <- FALSE ## TRUE,FALSE

# Select the columns on which dummification is to be performed
dum_cols <- c("location.type","class")
########################################################################################
[1] "One-Hot Encoding was not performed on dataset."

6.2.10 Check for Singularity

# Check data for singularity
singular_cols <- sapply(dataset_updated,function(x) length(unique(x))) %>%  # convert to dataframe
  data.frame(Unique_n = .) %>% dplyr::filter(Unique_n == 1) %>% 
  rownames() %>% data.frame(Constant_Variables = .)

if(nrow(singular_cols) != 0) {                              
  singular_cols  %>% DT::datatable()
} else {
  paste("There are no singular columns in the dataset") %>% htmltools::HTML()
}
There are no singular columns in the dataset
# Display variance of columns
data <- dataset_updated %>% dplyr::summarise_if(is.numeric, var) %>% t() %>% 
  data.frame() %>% round(3) #%>% DT::datatable(colnames = "Variance")

colnames(data) <- c("Variance")
Table(data,scroll = FALSE)
Variance
x 2.239
y 2.214
z 0.499

6.2.11 Selecting only Numeric Cols after Dummification

numeric_cols=as.vector(sapply(dataset_updated, is.numeric))
dataset_updated=dataset_updated[,numeric_cols]
colnames(dataset_updated)
[1] "x" "y" "z"

6.2.12 Final Dataset Summary

All further operations will be performed on the following dataset. For the sake of brevity, we are displaying here only first 10 rows.

nums <- colnames(dataset_updated)[unlist(lapply(dataset_updated, is.numeric))]
cat(paste0("Final data frame contains ", nrow( dataset_updated), " row(s) and ", ncol( dataset_updated), " column(s).","Code chunk executed. Below table showing first 10 row(s) of the dataset."))
Final data frame contains 12000 row(s) and 3 column(s).Code chunk executed. Below table showing first 10 row(s) of the dataset.
dataset_updated <-  dataset_updated %>% mutate_if(is.numeric, round, digits = 4)

dataset_updated %>% head(10) %>%
  as.data.frame() %>%
  Table(scroll = FALSE)
x y z
-1.0020 -2.3335 -0.8420
1.1021 -2.7447 -0.2878
-1.0033 1.2656 -0.9229
1.3204 0.6205 -0.8410
1.2998 -1.2470 -0.9801
-1.9606 2.0755 -0.5184
0.2807 -0.9724 0.1548
-1.5540 2.0564 -0.8164
-2.4653 1.6586 -0.2377
-2.3234 1.6933 -0.4841
      DT::datatable(
        dataset_updated %>%
          select_if(., is.numeric) %>%
          skimr::skim() %>%
          mutate_if(is.numeric, round, digits = 4) %>%
          rename_at(.vars = vars(starts_with("skim_")), .funs = funs(sub("skim_", "", .))) %>%
          rename_at(.vars = vars(starts_with("numeric.")), .funs = funs(sub("numeric.", "", .))) %>%
          select(-c(type, n_missing, complete_rate)) %>%
          mutate(n_row = nrow(dataset_updated),
                 n_missing = rowSums(is.na(.))
                 # ,n_non_missing = n_row - n_missing
                 ) ,
        selection = "none",
        # filter = "top",
        class = 'cell-border stripe',
        escape = FALSE,
        options = options,
        callback = htmlwidgets::JS(
          "var tips = ['Index showing column number',
                        'Columns used for building the HVT model',
                        'Histogram for individual column',
                        'Number of records for each feature',
                        'Number of missing (NA) values for each feature',
                        'Mean of individual column',
                        'Standard deviation of individual column',
                        '0th Percentile means that the values are smaller than all 100% of the rows',
                        '25th Percentile means that the values are bigger than 25% and smaller than only 75% of the rows',
                        '50th Percentile means that the values are bigger than 50% and smaller than only 50% of the rows',
                        '75th Percentile means that the values are bigger than 75% and smaller than only 25% of the rows',
                        '100th Percentile means that the values are bigger than 100% of the rows'],
                            header = table.columns().header();
                        for (var i = 0; i < tips.length; i++) {
                          $(header[i]).attr('title', tips[i]);
                        }"
        )
      )
#print(
 aa <-  dataset_updated %>%
    select_if(., is.numeric) %>%
    skimr::skim() %>%
    mutate_if(is.numeric, round, digits = 4) %>%
    rename_at(.vars = vars(starts_with("skim_")), .funs = funs(sub("skim_", "", .))) %>%
    rename_at(.vars = vars(starts_with("numeric.")), .funs = funs(sub("numeric.", "", .))) %>%
    select(-c(type, n_missing, complete_rate)) %>%
    mutate(n_row = nrow(dataset_updated),
           n_missing = rowSums(is.na(.))
           # ,n_non_missing = n_row - n_missing
           )
Table(aa,scroll = TRUE, limit = 20)
variable mean sd p0 p25 p50 p75 p100 hist n_row n_missing
x -0.0038 1.4964 -2.9991 -1.1325 0.0086 1.1215 2.9993 ▅▇▇▇▅ 12000 0
y -0.0110 1.4879 -2.9999 -1.1106 -0.0229 1.1060 2.9997 ▅▇▇▇▅ 12000 0
z -0.0108 0.7064 -1.0000 -0.7162 -0.0178 0.6972 1.0000 ▇▃▃▃▇ 12000 0

6.3. Data distribution

Variable Histograms

Shown below is the distribution of all the variables in the dataset.

eda_cols <- names(dataset_updated)
# Here we plot the distribution of columns selected by user for numerical transformation
dist_list <- lapply(1:length(eda_cols), function(i){
generateDistributionPlot(dataset_updated, eda_cols[i]) })
do.call(gridExtra::grid.arrange, args = list(grobs = dist_list, ncol = 2, top = "Distribution of Features"))

Box Plots

In this section, we plot box plots for each numeric column in the dataset across panels. These plots will display the median and Inter Quartile Range of each column at a panel level.

## the below function helps plotting quantile outlier plot for multiple variables
quantile_outlier_plots_fn <- function(data, outlier_check_var, numeric_cols = numeric_cols){
    # lower threshold
    lower_threshold <- stats::quantile(data[, outlier_check_var], .25,na.rm = TRUE) - 1.5*(stats::IQR(data[, outlier_check_var], na.rm = TRUE))
    
    # upper threshold
    upper_threshold <- stats::quantile(data[,outlier_check_var],.75,na.rm = TRUE) + 1.5*(stats::IQR(data[,outlier_check_var],na.rm = TRUE))
    
    # Look for outliers based on thresholds
    data$QuantileOutlier <- data[,outlier_check_var] > upper_threshold | data[,outlier_check_var] < lower_threshold

  # Plot box plot
  quantile_outlier_plot <- ggplot2::ggplot(data, ggplot2::aes(x="", y = data[,outlier_check_var])) +
             ggplot2::geom_boxplot(fill = 'blue',alpha=0.7) + 
             ggplot2::theme_bw() + 
             ggplot2::theme(panel.border=ggplot2::element_rect(size=0.1),panel.grid.minor.x=ggplot2::element_blank(),panel.grid.major.x=ggplot2::element_blank(),legend.position = "bottom") + ggplot2::ylab(outlier_check_var) + ggplot2::xlab("")
  data <- cbind(data[, !names(data) %in% c("QuantileOutlier")] %>% round(2), outlier = data[, c("QuantileOutlier")])
  data <- cbind(data)  
  return(list(quantile_outlier_plot, data, lower_threshold, upper_threshold))
}
## the below code gives the interactive plot for Quantile Outlier analysis for numerical variables 
box_plots <- list()
for (x in names(dataset_updated)) {

box_plots[[x]] <- quantile_outlier_plots_fn(data = dataset_updated, outlier_check_var = x)[[1]]

}

gridExtra::grid.arrange(grobs = box_plots, ncol = 3)

Correlation Matrix

In this section we are calculating pearson correlation which is a bivariate correlation value measuring the linear correlation between two numeric columns. The output shown is a matrix.

6.4 Train - Test Split

Let us first split the data into train and test randomly. We will use 80% of the data as train and remaining as test.

## 80% of the sample size
smp_size <- floor(0.80 * nrow(dataset_updated))

## set the seed to make your partition reproducible
set.seed(279)
train_ind <- sample(seq_len(nrow(dataset_updated)), size = smp_size)

dataset_updated_train <- dataset_updated[train_ind, ]
dataset_updated_test <- dataset_updated[-train_ind, ]

The train data contains 9600 rows and 3 columns. The test data contains 2400 rows and 3 columns.

6.4.1 Train Distribution

eda_cols <- names(dataset_updated_train)
# Here we plot the distribution of columns selected by user for numerical transformation
dist_list <- lapply(1:length(eda_cols), function(i){
generateDistributionPlot(dataset_updated_train, eda_cols[i]) })
do.call(gridExtra::grid.arrange, args = list(grobs = dist_list, ncol = 2, top = "Distribution of Features"))

6.4.2 Test Distribution

eda_cols <- names(dataset_updated_test)
# Here we plot the distribution of columns selected by user for numerical transformation
dist_list <- lapply(1:length(eda_cols), function(i){
generateDistributionPlot(dataset_updated_test, eda_cols[i]) })
do.call(gridExtra::grid.arrange, args = list(grobs = dist_list, ncol = 2, top = "Distribution of Features"))

Note: The steps of compression, projection, and tessellation are iteratively performed until a minimum compression rate of 80% is achieved. Once the desired compression is attained, the resulting model object is used for scoring using the scoreHVT() function

In this section all the outlined workflow steps provided in the abstract section (Compression, Projection, Tessellation and Scoring) are executed at level 1.

6.5 Step 1: Data Compression

The core function for compression in the workflow is HVQ, which is called within the trainHVT function. we have a parameter called quantization error. This parameter acts as a threshold and determines the number of levels in the hierarchy. It means that, if there are ‘n’ number of levels in the hierarchy, then all the clusters formed till this level will have quantization error equal or greater than the threshold quantization error. The user can define the number of clusters in the first level of hierarchy and then each cluster in first level is sub-divided into the same number of clusters as there are in the first level. This process continues and each group is divided into smaller clusters as long as the threshold quantization error is met. The output of this technique will be hierarchically arranged vector quantized data.

However, let’s try to comprehend the trainHVT function first before moving on.

trainHVT(
  dataset,
  min_compression_perc,
  n_cells,
  depth,
  quant.err,
  projection.scale,
  normalize = TRUE,
  distance_metric = c("L1_Norm", "L2_Norm"),
  error_metric = c("mean", "max"),
  quant_method = c("kmeans", "kmedoids"),
  diagnose = TRUE,
  hvt_validation = FALSE,
  train_validation_split_ratio = 0.8
)

Each of the parameters of trainHVT function have been explained below:

The output of trainHVT function (list of 6 elements) have been explained below:

We will use the trainHVT function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression upto atleast 80%. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the quantization error threshold or increasing the number of cells and then rerunning the trainHVT function again.

In our example we will iteratively increase the number of cells until the desired compression percentage is reached instead of increasing the quantization threshold because it may reduce the level of detail captured in the data representation

Iteration 1:

We will pass the below mentioned model parameters along with torus training dataset (containing 9600 datapoints) to trainHVT function.

Model Parameters

  • Number of Cells at each Level = 100
  • Maximum Depth = 1
  • Quantization Error Threshold = 0.1
  • Error Metric = Max
  • Distance Metric = Manhattan
set.seed(240)
hvt.torus <- trainHVT(
  dataset_updated_train,
  n_cells = 100,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = FALSE,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s checkout the compression summary.

compressionSummaryTable(hvt.torus[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 0 0 n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

As it can be seen from the table above, none of the 100 cells have reached the quantization threshold error. Therefore we can further subdivide the cells by increasing the n_cells parameters and then see if desired compression (80%) is reached

Let’s take a look on the 1D projection of this iteration. The output of hvq from the above iteration is then passed to the plotHVT function, which applies Sammon’s 1D using MASS package. The resulting 1D Sammon’s points are used to determine their corresponding cell IDs and subsequently plotted in a plotly object.

plotHVT(heatmap = '1D')

Figure 3: Sammons 1D x Cell ID plot for layer 1 shown for the 100 cells in the dataset ’torus’

Iteration 2:

Let’s retry by increasing the n_cells parameter to 300 along with torus training dataset (containing 9600 datapoints).

Model Parameters

  • Number of Cells at each Level = 450
  • Maximum Depth = 1
  • Quantization Error Threshold = 0.1
  • Error Metric = Max
  • Distance Metric = Manhattan
set.seed(240)
hvt.torus2 <- trainHVT(
  dataset_updated_train,
  n_cells = 450,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = FALSE,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s checkout the compression summary again.

compressionSummaryTable(hvt.torus2[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 450 104 0.23 n_cells: 450 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

It can be observed from the table above that only 5 cells out of 300 i.e. 2% of the cells reached the Quantization Error threshold. Therefore we can further subdivide the cells by increasing the n_cells parameters and then see if 80% compression is reached

plotHVT(heatmap = '1D')

Figure 4: Sammons 1D x Cell ID plot for layer 1 shown for the 300 cells in the dataset ’torus’

Iteration 3:

Since we are yet to achieve the compression of atleast 80%, lets try again by increasing the n_cells parameter to 900 along with torus training dataset (containing 9600 datapoints) .

Model Parameters

  • Number of Cells at each Level = 900
  • Maximum Depth = 1
  • Quantization Error Threshold = 0.1
  • Error Metric = Max
  • Distance Metric = Manhattan
set.seed(240)
hvt.torus3 <- trainHVT(
  dataset_updated_train,
  n_cells = 900,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = FALSE,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s check the compression summary for torus.

compressionSummaryTable(hvt.torus3[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 900 759 0.84 n_cells: 900 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

By increasing the number of cells to 900, we were successfully able to compress 85% of the data, so we will not further subdivide the cells

We successfully compressed 84% of the data using n_cells parameter as 900, the next step involves performing data projection on the compressed data. In this step, the compressed data will be transformed and projected onto a lower-dimensional space to visualize and analyze the data in a more manageable form.

plotHVT(heatmap = '1D')

Figure 5: Sammons 1D x Cell ID plot for layer 1 shown for the 900 cells in the dataset ’torus’

6.6 Step 2: Data Projection

Sammon’s projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality while attempting to preserve the structure of inter-point distances in the projection.The centroids are plotted in 2D after performing Sammon’s projection at every level of the tessellation.

Iteration 1:

lets view the projected 2D coordinates after performing sammon’s projection on the compressed data for the first iteration where we set n_cells parameter as 100. For the sake of brevity we are displaying first six rows.

hvt_torus_coordinates <-hvt.torus[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates1 <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates1))
Row.No x_coord y_coord
1 1.4545 -25.3107
2 27.4200 -11.0878
3 -24.3649 -16.1494
4 -13.5273 -3.3521
5 6.0549 17.6020
6 -21.6044 -22.7822

Lets see the projected Sammons 2D onto a plane with n_cell set to 100 in first iteration.

ggplot(centroid_coordinates1, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")

Figure 6: Sammons 2D Plot for 100 cells

Iteration 2:

lets view the projected 2D coordinates after performing sammon’s projection on the compressed data for the Second iteration where we set n_cells parameter as 300. For the sake of brevity we are displaying first six rows.

hvt_torus_coordinates <-hvt.torus2[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates2 <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates2))
Row.No x_coord y_coord
1 -28.1754 -0.6421
2 -17.0859 21.1497
3 -7.0472 -28.9582
4 1.2971 -17.6163
5 15.5526 12.4450
6 -10.5044 -29.5311

Lets see the projected Sammons 2D onto a plane with n_cell set to 300 in second iteration.

ggplot(centroid_coordinates2, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")

Figure 7: Sammons 2D Plot for 300 cells

Iteration 3:

lets view the projected 2D coordinates after performing sammon’s projection on the compressed data for the third iteration where we set n_cells parameter as 900. For the sake of brevity we are displaying first six rows.

hvt_torus_coordinates <-hvt.torus3[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates3 <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates3))
Row.No x_coord y_coord
1 -19.2959 22.5085
2 6.2688 26.6529
3 -26.2055 -12.5034
4 -11.0968 -12.5747
5 17.8239 -7.5784
6 -29.0343 -11.4887

Lets see the projected Sammons 2D onto a plane with n_cell set to 900 in third iteration.

ggplot(centroid_coordinates3, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")

Figure 8: Sammons 2D Plot for 900 cells

6.7 Step 3: Tessellation

The deldir package computes the Delaunay triangulation (and hence the Dirichlet or Voronoi tessellation) of a planar point set according to the second (iterative) algorithm of Lee and Schacter. For subsequent levels, transformation is performed on the 2D coordinates to get all the points within its parent tile. Tessellations are plotted using these transformed points as centroids.plotHVT is the main function to plot hierarchical voronoi tessellation.

Now let’s try to understand plotHVT function. The parameters have been explained in detail below:

plotHVT <-(hvt.results, line.width, color.vec, pch1 = 21, palette.color = 6, centroid.size = 1.5, title = NULL, maxDepth = NULL, dataset, child.level, hmap.cols, previous_level_heatmap = TRUE, show.points = FALSE, asp = 1, ask = TRUE, tess.label = NULL, quant.error.hmap = NULL, n_cells.hmap = NULL, label.size = 0.5, sepration_width = 7, layer_opacity = c(0.5, 0.75, 0.99), dim_size = 1000, heatmap = '2Dhvt') 

Iteration 1:

To enhance visualization, let’s generate a plot of the Voronoi tessellation for the first iteration where we set n_cells parameter as 100. This plot will provide a visual representation of the Voronoi regions corresponding to the data points, aiding in the analysis and understanding of the data distribution.

plotHVT(
  hvt.torus,
  line.width = c(0.4),
  color.vec = c("#141B41"),
  centroid.size = 0.6,
  maxDepth = 1, 
  heatmap = '2Dhvt'
)

Figure 9: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’torus’

Iteration 2:

Now, let’s plot the Voronoi tessellation for the second iteration where we set n_cells parameter to 300.

plotHVT(
  hvt.torus2,
  line.width = c(0.4),
  color.vec = c("#141B41"),
  centroid.size = 0.6,
  maxDepth = 1,
  heatmap = '2Dhvt'
)

Figure 10: The Voronoi tessellation for layer 1 shown for the 300 cells in the dataset ’torus’

Iteration 3:

Now, let’s plot the Voronoi tessellation again, for the third iteration where we set n_cells parameter to 900.

plotHVT(
  hvt.torus3,
  line.width = c(0.4),
  color.vec = c("#141B41"),
  centroid.size = 0.6,
  maxDepth = 1,
  heatmap = '2Dhvt'
)

Figure 11: The Voronoi tessellation for layer 1 shown for the 900 cells in the dataset ’torus’

From the presented plot, the inherent structure of the donut can be easily observed in the two-dimensional space

We will now overlay all the features as heatmap over the Voronoi Tessellation plot for better visualization and identification of patterns, trends, and variations in the data.

Heat Maps We will now overlay all the features as heatmap over the Voronoi Tessellation plot for better visualization and identification of patterns, trends, and variations in the data.

Now let’s plot the Voronoi Tessellation with the heatmap overlaid for all the features in the torus data for better visualization and interpretation of data patterns and distributions.

The heatmaps displayed below provides a visual representation of the spatial characteristics of the torus, allowing us to observe patterns and trends in the distribution of each of the features (n,X,Y and Z). The sheer green shades highlight regions with higher coordinate values in each of the heatmaps, while the indigo shades indicate areas with the lowest coordinate values in each of the heatmaps. By analyzing these heatmaps, we can gain insights into the variations and relationships between each of these features within the torus structure.

plotHVT(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "n",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = TRUE,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 12: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for No. of entities in each cell in the ’torus’ dataset

plotHVT(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "x",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = TRUE,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 13: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable x in the ’torus’ dataset

plotHVT(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "y",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = TRUE,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 14: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable y in the ’torus’ dataset

plotHVT(
  hvt.torus3,
  torus_df,
  child.level = 1,
  hmap.cols = "z",
  line.width = c(0.4),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.8,
  show.points = TRUE,
  quant.error.hmap = 0.1,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 15: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable z in the ’torus’ dataset

6.8 Step 4: Scoring(scoreHVT)

Testing Dataset

Lets have a look at our randomly selected test dataset containing (2400 points) before we pass it to scoreHVT function for scoring.

Table(head(dataset_updated_test))
x y z
1 -1.0020 -2.3335 -0.8420
18 -2.1770 1.5699 -0.7295
19 2.0941 -1.8907 -0.5704
20 -0.1980 2.9839 -0.1378
21 -1.2495 -1.9487 -0.9491
23 2.0634 0.0815 -0.9979

However, let’s try to comprehend the scoreHVT function first before moving on

scoreHVT(data,
         hvt.results.model,
         child.level,
         mad.threshold,
         line.width,
         color.vec,
         normalize,
         seed,
         distance_metric,
         error_metric,
         yVar)

The important parameters for the function scoreHVT are as below:

Now once we have built the model, let us try to score using our test dataset (containing 2400 data points) which cell and which level each point belongs to.

set.seed(240)
scoring_torus <- scoreHVT(
  dataset_updated_test,
  hvt.torus3,
  child.level = 1,
  line.width = c(1.2),
  color.vec = c("#141B41"),
  normalize = FALSE
)

Let’s see which cell and level each point belongs to and check the mean absolute difference for each of the 2400 records. For the sake of brevity, we will only show the first 100 rows.

Act_pred_Table <- scoring_torus[["actual_predictedTable"]]
rownames(Act_pred_Table) <- NULL
Act_pred_Table %>% head(100) %>%as.data.frame() %>%Table(scroll = TRUE, limit = 100)
Row.No act_x act_y act_z Cell.ID pred_x pred_y pred_z diff
1 -1.0020 -2.3335 -0.8420 202 -1.0851700 -2.4147600 -0.7558500 0.0835267
18 -2.1770 1.5699 -0.7295 861 -2.1292556 1.5659222 -0.7611222 0.0277815
19 2.0941 -1.8907 -0.5704 31 1.9080933 -1.9164733 -0.7048667 0.1154156
20 -0.1980 2.9839 -0.1378 854 -0.2887000 2.9528750 -0.2405500 0.0748250
21 -1.2495 -1.9487 -0.9491 362 -1.2923091 -1.7790091 -0.9778636 0.0804212
23 2.0634 0.0815 -0.9979 244 2.0724500 0.2440000 -0.9956000 0.0579500
29 1.5182 0.9935 -0.9826 463 1.3512154 0.9516308 -0.9358462 0.0852026
32 0.4102 0.9552 -0.2784 538 0.3457091 0.9675455 -0.2320818 0.0410515
36 1.5409 2.3249 0.6142 676 1.5638778 2.2722333 0.6419333 0.0344593
42 -0.9558 -0.6764 0.5591 458 -0.9201818 -0.7053455 0.5426182 0.0270152
44 -0.0790 1.8156 -0.9832 725 0.0008000 1.8245545 -0.9820727 0.0299606
49 0.7280 0.6856 0.0014 480 0.6877800 0.7274300 -0.0134700 0.0323067
54 -2.8278 1.0012 0.0209 880 -2.7548375 1.1802750 0.0115062 0.0871437
56 0.3594 -1.8433 -0.9925 184 0.2463571 -1.8138714 -0.9840143 0.0503190
57 -0.9926 -0.9557 0.7829 416 -0.9617875 -1.0542250 0.8160937 0.0541771
59 -2.5097 1.0868 -0.6782 852 -2.5404375 1.0374250 -0.6662125 0.0307000
73 -0.1015 1.7005 -0.9550 699 -0.0637462 1.6219385 -0.9259000 0.0484718
74 1.7236 1.5863 0.9395 504 1.6509667 1.6511222 0.9370556 0.0466333
76 1.6227 -1.6186 0.9564 78 1.5690000 -1.6285750 0.9638000 0.0236917
82 1.3105 -0.4772 0.7960 248 1.2363941 -0.4988353 0.7445353 0.0490686
87 -1.0363 -0.2099 0.3338 509 -1.0012923 -0.3195385 0.3123000 0.0553821
93 1.3842 0.3319 0.8170 342 1.4072444 0.3028556 0.8272333 0.0207741
103 -1.2680 1.0389 0.9327 729 -1.2901000 0.9838875 0.9245500 0.0284208
106 -2.6369 -0.5918 -0.7117 731 -2.6860357 -0.5990071 -0.6504357 0.0392024
117 0.7185 2.1057 -0.9744 713 0.5689563 2.0649125 -0.9853125 0.0670812
120 2.3827 0.6634 0.8809 251 2.4310429 0.8484786 0.8125000 0.1006071
125 -1.7146 -1.6026 0.9379 376 -1.6712182 -1.7395182 0.9032455 0.0716515
127 1.3901 2.2967 -0.7289 711 1.3906444 2.3950667 -0.6309667 0.0656148
129 -1.1141 -0.0945 -0.4715 557 -1.1109900 -0.0442700 -0.4591900 0.0218833
130 2.4161 -1.5692 0.4732 14 2.4900700 -1.5184800 0.3777300 0.0733867
157 1.7900 0.0487 -0.9779 268 1.8017000 0.1732500 -0.9806250 0.0463250
161 -0.3151 1.1740 -0.6202 647 -0.3628667 1.1364833 -0.5909167 0.0381889
167 -1.4384 2.6024 -0.2288 889 -1.3192571 2.6792857 -0.1169286 0.1026333
168 0.5732 0.8297 0.1298 507 0.5264400 0.8553600 0.0951600 0.0356867
183 -2.0384 -1.7248 -0.7422 454 -1.9883900 -1.6312000 -0.8137000 0.0717033
185 0.4926 0.9927 0.4523 512 0.5844900 0.9532700 0.4726100 0.0505433
186 2.1445 0.4366 -0.9821 240 2.2592625 0.3932125 -0.9540375 0.0620708
188 -2.1026 -0.5622 0.9843 648 -2.1583100 -0.6864500 0.9591800 0.0683600
189 1.0673 1.6906 1.0000 576 1.1020909 1.7104909 0.9977545 0.0189758
194 1.0620 0.2168 -0.4009 392 1.0518556 0.1971778 -0.3683000 0.0207889
198 -1.2506 0.9453 -0.9017 701 -1.2482417 0.8175083 -0.8601167 0.0572444
199 1.7293 -2.3387 -0.4177 13 1.5891133 -2.4615133 -0.3437867 0.1123044
212 -2.7339 -0.5932 0.6032 741 -2.7985429 -0.5934000 0.5024286 0.0552048
218 -1.4642 0.0022 0.8444 665 -1.5482316 0.1035579 0.8897105 0.0769000
223 0.4330 -2.3813 0.9074 72 0.4716933 -2.4465000 0.8646800 0.0488711
224 2.7641 0.7309 -0.5118 234 2.7097889 0.8299556 -0.5467111 0.0627593
227 -2.5086 0.7689 0.7815 820 -2.3963176 0.7616412 0.8506706 0.0629039
237 1.4787 -0.2794 0.8688 225 1.4504333 -0.4393556 0.8747444 0.0647222
239 -0.7918 -0.7288 -0.3828 452 -0.7919111 -0.7162667 -0.3605556 0.0116296
243 -0.4083 -1.0176 -0.4286 387 -0.4781000 -1.0273400 -0.4965300 0.0491567
251 0.3045 1.0556 0.4330 549 0.3059667 1.0604667 0.4431333 0.0054889
252 -0.5818 0.9918 0.5266 644 -0.6431000 0.9506800 0.5235200 0.0351667
255 -1.3209 -1.2284 0.9806 449 -1.4615636 -1.1593364 0.9888636 0.0726636
256 -1.3814 0.7997 0.9148 706 -1.3934444 0.7314000 0.9038889 0.0304185
262 2.0533 -1.8681 0.6308 16 1.9888667 -1.9071000 0.6485444 0.0403926
264 0.9554 -0.3904 0.2514 325 0.9924647 -0.3202941 0.2858235 0.0471980
265 1.9167 -0.6986 -0.9992 146 1.9041545 -0.8491273 -0.9951182 0.0557182
268 0.4723 2.6730 0.6997 794 0.2629286 2.6439571 0.7482714 0.0956619
271 0.5408 -0.9755 0.4664 283 0.4755111 -0.9722778 0.3943778 0.0468444
281 2.8851 0.7376 0.2091 200 2.8578111 0.8311444 0.1842778 0.0485519
283 -1.7929 1.0316 -0.9977 792 -1.7619125 1.1592500 -0.9910375 0.0551000
296 2.9771 -0.3583 0.0534 82 2.9718571 -0.2340714 0.1667714 0.0809476
304 1.9528 -0.0645 0.9989 183 2.0231636 -0.1042273 0.9959818 0.0376697
305 1.6524 -2.2377 -0.6237 25 1.6610467 -2.2227667 -0.6242867 0.0080556
311 -0.5211 -1.8901 0.9992 205 -0.4357933 -1.8622733 0.9909867 0.0404489
312 -2.3845 0.6975 -0.8749 817 -2.3404154 0.8306154 -0.8722385 0.0599538
316 -2.4536 0.1933 -0.8873 757 -2.4429091 -0.0059364 -0.8919727 0.0715333
325 2.2249 1.0713 -0.8830 391 2.2511769 1.1186231 -0.8542923 0.0341026
335 -0.8697 -2.1554 -0.9460 233 -0.9630400 -2.1712600 -0.9227200 0.0441600
336 0.8484 -0.9514 0.6884 252 0.7775636 -0.9132091 0.5981000 0.0664424
339 0.6624 -1.5950 -0.9620 170 0.7259000 -1.6756556 -0.9829333 0.0550296
344 2.2880 -0.8983 0.8890 81 2.2376737 -1.0123632 0.8869158 0.0554912
347 0.3260 -1.1945 0.6478 256 0.3239000 -1.1878000 0.6390381 0.0058540
348 0.6464 -0.8397 -0.3404 302 0.6431800 -0.8024300 -0.2359100 0.0483267
349 1.6026 2.5123 0.1994 707 1.6242818 2.4704636 0.2757545 0.0466242
358 1.1013 -0.0970 0.4471 335 1.0856600 -0.1450700 0.4231600 0.0292167
369 -0.4813 -1.5246 -0.9160 321 -0.4077133 -1.4099933 -0.8435267 0.0868889
373 1.3371 -0.1317 -0.7544 292 1.4104000 -0.1188857 -0.8103000 0.0473381
374 -0.9502 -0.3186 0.0659 493 -0.9271714 -0.3796714 0.0297786 0.0400738
376 2.0180 -1.1153 0.9521 90 1.9730250 -1.1611250 0.9559500 0.0315500
379 -2.0152 2.0241 0.5166 891 -1.8077692 2.2154308 0.5009231 0.1381462
382 -1.0988 1.5419 -0.9943 760 -1.0603250 1.4528083 -0.9772250 0.0482139
384 -1.9070 2.2917 -0.1921 897 -2.0360417 2.1327917 -0.2942167 0.1300222
386 -0.7320 -1.3653 -0.8926 359 -0.7223538 -1.4142462 -0.9080231 0.0246718
393 0.3054 -0.9587 0.1109 316 0.2870400 -0.9827800 0.2186800 0.0500733
402 0.7924 1.1246 0.7812 525 0.7858300 1.2282900 0.8379900 0.0556833
408 -0.8322 0.5737 0.1468 622 -0.8053375 0.6721375 0.3069250 0.0951417
411 -0.7373 -0.6758 -0.0178 429 -0.6672200 -0.7481900 0.0678500 0.0760400
412 -1.9716 -2.2501 -0.1285 356 -2.0890467 -2.1122867 -0.2001400 0.1089667
414 -1.8333 -0.7709 0.9999 518 -1.6911842 -0.7873579 0.9864579 0.0573386
430 1.7131 -2.3705 -0.3806 13 1.5891133 -2.4615133 -0.3437867 0.0839378
440 -1.0690 0.0172 -0.3653 547 -1.0425818 -0.0349364 -0.2963909 0.0491545
446 2.6297 -1.2025 -0.4528 23 2.6806778 -1.2774000 -0.2282556 0.1168074
448 -1.4048 -2.4167 -0.6061 235 -1.3858429 -2.3431143 -0.6812643 0.0559024
450 -1.4259 0.4853 0.8696 688 -1.4962833 0.4009556 0.8919833 0.0590370
459 -1.7295 2.1032 0.6909 877 -1.8655308 1.9109231 0.7362615 0.1245564
476 -0.7591 -1.6176 -0.9770 276 -0.6239929 -1.6889857 -0.9780714 0.0691881
484 -2.5488 0.6408 -0.7781 828 -2.5578625 0.7549875 -0.7374250 0.0546417
488 -1.2734 0.1981 0.7029 587 -1.2802947 0.0021842 0.6918789 0.0712772
491 0.8795 0.4824 -0.0794 434 0.8795000 0.4767286 0.0370571 0.0407095
hist(Act_pred_Table$diff, breaks = 20, col = "blue", main = "Mean Absolute Difference", xlab = "Difference",xlim = c(0,0.20), ylim = c(0,500))

Figure 16: Mean Absolute Difference

7. Example II: HVT with the Personal Computer dataset

Data Understanding

In this section, we will use the Prices of Personal Computers dataset. This dataset contains 6259 observations and 10 features. The dataset observes the price from 1993 to 1995 of 486 personal computers in the US. The variables are price, speed, ram, screen, cd, etc. The dataset can be downloaded from here.

To load the dataset, refer section 6.1.1

NOTE: We have already executed the data processing part for Prices of Personal Computers dataset and stored the train and test split as csv. So we are directly loading those csv’s from local.

After processing the Prices of Personal Computers dataset, we have 6183 rows and 6 features since the rows of missing values and categorical columns are dropped.

The current features are:

Let’s explore the RAW - Prices of Personal Computers (6259 points). For the sake of brevity we are displaying first six rows.

# Quick peek
computers <- read.csv("https://raw.githubusercontent.com/Mu-Sigma/HVT/master/vignettes/sample_dataset/Computers.csv")
computers <- computers[,-1]
Table(head(computers))
price speed hd ram screen cd multi premium ads trend
1499 25 80 4 14 no no yes 94 1
1795 33 85 2 14 no no yes 94 1
1595 25 170 4 15 no no yes 94 1
1849 25 170 8 14 no no no 94 1
3295 33 340 16 14 no no yes 94 1
3695 66 340 16 14 no no yes 94 1

Training Dataset

Now, lets have a look at the randomly selected training dataset containing (4946 data points). For the sake of brevity we are displaying first six rows.

trainComputers <- read.csv("./sample_dataset/trainComputers.csv")
trainComputers <- trainComputers[,-1]
Table(head(trainComputers))
price speed hd ram screen ads trend
2348 33 245 4 14 216 13
1469 25 120 4 14 216 13
2225 66 340 8 15 275 12
2575 50 250 8 15 139 5
2943 33 1000 24 14 248 20
2425 66 212 8 15 267 15

Testing Dataset

Now, lets have a look at the randomly selected testing dataset containing (1237 data points). For the sake of brevity we are displaying first six rows.

testComputers <- read.csv("./sample_dataset/testComputers.csv")
testComputers <- testComputers[,-1]
Table(head(testComputers))
price speed hd ram screen ads trend
1595 25 170 4 15 94 1
1849 25 170 8 14 94 1
2575 50 210 4 15 94 1
2195 33 170 8 15 94 1
2295 25 245 8 14 94 1
2699 50 212 8 14 94 1

As we are familiar with the structure of the computers data, we will now follow the following steps to get the scores using the Computers dataset.

7.1 Step 1: Data Compression

For more detailed information on Data Compression please refer to section 2 of this vignette.

We will use the trainHVT function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression upto atleast 80%. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the quantization error threshold or increasing the number of cells and then rerunning the trainHVT function again.

In our example we will iteratively increase the number of cells until the desired compression percentage is reached instead of increasing the quantization threshold because it may reduce the level of detail captured in the data representation

We will pass the below mentioned model parameters along with computers training dataset (5007) to trainHVT function.

Model Parameters

set.seed(240)
hvt.results <- list()
hvt.results <- trainHVT(trainComputers,   
                          n_cells = 450,
                          depth = 1,
                          quant.err = 0.2,
                          projection.scale = 10,
                          normalize = TRUE,
                          distance_metric = "L1_Norm",
                          error_metric = "max",
                          quant_method = "kmeans",
                          diagnose = FALSE)

Now let’s check the compression summary. The table below shows no of cells, no of cells having quantization error below threshold and percentage of cells having quantization error below threshold for each level.

compressionSummaryTable(hvt.results[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 450 369 0.82 n_cells: 450 quant.err: 0.2 distance_metric: L1_Norm error_metric: max quant_method: kmeans

As it can be seen from the table above, 82% of the cells have reached the quantization threshold error. Since we are successfully able to attain the desired compression percentage, so we will not further subdivide the cells

hvt.results[[3]] gives us detailed information about the hierarchical vector quantized data.

hvt.results[[3]][['summary']] gives a nice tabular data containing no of points, Quantization Error and the codebook.

The datatable displayed below is the summary from hvt.results showing Cell.IDs, Centroids and Quantization Error for the 450 cells.

summaryTable(hvt.results[[3]]$summary)
Segment.Level Segment.Parent Segment.Child n Cell.ID Quant.Error price speed hd ram screen ads trend
1 1 1 3 339 0.08 0.03 -0.10 2.23 -0.05 -0.66 -0.48 0.78
1 1 2 17 268 0.22 0.51 0.68 0.10 -0.05 0.45 0.94 -0.13
1 1 3 3 183 0.22 -0.07 -1.02 -0.71 -0.52 0.45 -0.33 0.61
1 1 4 15 89 0.08 -1.08 -0.90 -0.78 -0.76 -0.66 0.27 0.49
1 1 5 9 217 0.14 0.65 0.66 -0.41 -0.05 -0.66 0.36 -1.10
1 1 6 16 107 0.18 -0.31 -0.94 -0.85 -0.76 0.45 0.98 -0.04
1 1 7 11 138 0.13 0.11 -0.10 -0.66 -0.05 -0.66 -1.53 -1.62
1 1 8 6 427 0.1 -0.57 2.26 1.66 -0.05 0.45 -2.31 2.05
1 1 9 13 294 0.13 0.33 2.26 -0.05 -0.05 -0.66 1.51 0.12
1 1 10 19 112 0.17 0.36 -0.90 -0.71 -0.16 0.45 -1.64 -1.69
1 1 11 14 165 0.11 -0.29 -1.11 0.06 -0.05 -0.66 0.70 -0.22
1 1 12 4 411 0.07 0.35 0.66 2.23 -0.05 2.67 -0.68 1.10
1 1 13 5 442 0.22 0.25 2.26 3.05 1.36 0.45 -2.04 1.93
1 1 14 10 197 0.13 -0.79 -0.90 -0.03 -0.05 0.45 0.42 0.28
1 1 15 10 289 0.17 0.54 0.51 0.43 -0.05 -0.66 -0.40 0.69
1 1 16 16 270 0.09 1.10 -0.90 0.26 1.36 -0.66 0.35 -1.02
1 1 17 18 225 0.17 0.45 -0.10 -0.29 -0.05 0.45 0.58 -0.91
1 1 18 6 175 0.1 -0.38 -1.02 0.43 -0.05 -0.66 1.36 0.08
1 1 19 16 32 0.13 -0.21 -0.94 -0.75 -0.05 -0.66 -1.65 -1.72
1 1 20 8 320 0.11 0.33 2.26 -0.05 -0.05 0.45 1.46 0.11
1 1 21 4 428 0.15 -0.09 0.09 2.62 1.36 0.45 -1.97 1.80
1 1 22 9 166 0.16 -0.10 2.26 -0.67 -0.76 -0.66 1.33 0.07
1 1 23 16 28 0.09 -1.06 -1.11 -1.18 -1.11 -0.66 0.34 -0.99
1 1 24 6 49 0.04 -1.10 -1.27 -0.78 -0.76 -0.66 1.14 0.02
1 1 25 10 195 0.24 0.63 -0.42 -0.34 -0.76 0.45 0.98 -0.03
1 1 26 12 239 0.15 -0.18 0.66 -0.16 -0.05 0.45 0.56 -0.31
1 1 27 4 13 0.06 -1.85 -0.99 -1.15 -1.11 -0.66 0.94 -0.24
1 1 28 9 117 0.15 -0.75 -0.10 -0.38 -0.76 -0.66 0.63 -0.62
1 1 29 24 241 0.09 -0.30 0.66 0.01 -0.05 -0.66 0.14 0.47
1 1 30 13 168 0.11 -0.32 -0.90 -0.32 -0.05 0.45 0.71 -0.71
1 1 31 8 192 0.16 -0.30 0.66 -0.35 -0.76 0.45 0.74 -0.52
1 1 32 9 54 0.06 -1.07 -0.90 -0.80 -0.76 -0.66 0.35 -1.04
1 1 33 7 203 0.13 -0.86 0.34 -0.29 -0.05 -0.66 0.36 0.42
1 1 34 27 407 0.12 1.29 -0.90 2.23 2.76 -0.66 0.35 0.41
1 1 35 19 351 0.41 1.58 0.26 -0.46 -0.50 2.67 0.98 0.03
1 1 36 10 450 0.32 1.59 1.28 4.53 4.17 0.89 -2.45 2.37
1 1 37 15 306 0.12 -0.49 0.66 0.29 -0.05 0.45 -0.63 0.91
1 1 38 14 43 0.08 -1.20 -0.90 -1.18 -1.11 -0.66 0.77 -0.17
1 1 39 29 234 0.15 -0.61 -0.90 0.43 -0.05 -0.66 -0.64 0.98
1 1 40 12 87 0.09 -1.00 -0.90 -0.29 -0.76 -0.66 1.36 0.08
1 1 41 5 194 0.09 0.37 -0.90 0.47 -0.05 -0.66 0.54 -0.75
1 1 42 14 444 0.52 1.43 0.68 2.31 2.66 2.67 -0.89 1.27
1 1 43 6 295 0.15 1.15 -0.90 -0.07 1.36 0.45 0.58 -1.04
1 1 44 9 45 0.08 -1.00 -0.10 -1.18 -1.11 -0.66 1.33 0.07
1 1 45 18 27 0.13 0.00 0.66 -1.06 -0.78 -0.66 -1.63 -1.67
1 1 46 8 5 0.16 -0.52 -1.04 -0.69 -0.76 2.67 1.41 0.10
1 1 47 13 61 0.19 -1.15 -0.98 -0.73 -0.78 0.45 1.33 0.03
1 1 48 7 429 0.06 -0.03 0.66 1.66 1.36 0.45 -2.43 2.34
1 1 49 15 352 0.35 1.44 0.31 -0.09 1.36 -0.66 -1.25 -1.51
1 1 50 27 155 0.16 -0.13 0.74 -0.78 -0.76 -0.66 0.88 -0.04
1 1 51 9 134 0.24 -0.28 0.16 -0.93 -0.87 0.45 -0.72 -1.29
1 1 52 6 346 0.13 0.66 2.26 0.32 -0.05 0.45 1.50 0.12
1 1 53 6 303 0.1 0.39 2.26 -0.02 -0.05 -0.66 0.04 0.40
1 1 54 14 34 0.07 -1.19 -1.27 -1.06 -0.76 -0.66 0.35 -1.02
1 1 55 12 67 0.05 -0.59 -0.90 -0.78 -0.76 -0.66 1.02 -1.00
1 1 56 7 179 0.14 0.15 0.55 -0.71 -0.76 0.45 0.86 -0.78
1 1 57 12 262 0.09 0.32 -0.90 0.43 1.36 -0.66 0.66 0.14
1 1 58 10 296 0.21 0.45 -1.05 0.43 1.36 0.45 0.87 -0.03
1 1 59 4 216 0.11 0.38 0.66 -0.78 -0.58 0.45 0.82 -0.11
1 1 60 17 328 0.26 2.47 0.61 0.38 -0.30 0.45 0.56 -0.79
1 1 61 17 63 0.09 -0.52 -1.27 -1.03 -0.76 -0.66 0.64 -0.58
1 1 62 13 105 0.14 -0.29 -0.90 -0.70 -0.76 0.45 0.70 -0.84
1 1 63 6 240 0.06 0.68 -0.10 -0.58 -0.05 0.45 -1.11 -1.38
1 1 64 9 19 0.09 -1.19 -1.23 -1.18 -1.11 -0.66 0.91 -0.86
1 1 65 19 345 0.15 0.97 0.66 0.35 1.36 0.45 0.50 0.19
1 1 66 11 94 0.14 -0.65 0.66 -0.97 -0.79 -0.66 0.99 -0.96
1 1 67 8 18 0.17 -0.57 0.66 -1.19 -1.02 -0.66 -1.35 -1.50
1 1 68 9 237 0.18 1.02 -0.18 -0.35 -0.05 0.45 0.86 -0.24
1 1 69 16 349 0.12 -0.45 0.66 1.20 -0.05 0.45 -0.71 1.09
1 1 70 6 326 0.15 1.17 -0.10 0.38 1.36 0.45 0.50 -0.58
1 1 71 13 157 0.09 -0.36 -1.04 0.03 -0.05 -0.66 1.57 0.14
1 1 72 4 386 0.26 2.05 0.08 0.39 -0.05 2.67 -0.37 -1.23
1 1 73 8 253 0.08 0.70 -1.27 0.46 1.36 -0.66 0.95 -0.91
1 1 74 6 301 0.22 1.97 0.66 -0.32 -0.52 0.45 0.23 0.29
1 1 75 11 37 0.11 -1.14 -1.00 -1.18 -1.08 -0.66 0.64 -0.60
1 1 76 12 188 0.14 0.25 -0.93 -0.52 -0.05 0.45 -0.90 -1.33
1 1 77 11 136 0.15 0.78 -0.90 -0.40 -0.76 -0.66 0.44 -0.67
1 1 78 5 72 0.13 -1.39 -0.81 -0.67 -0.76 -0.66 -0.08 -0.37
1 1 79 25 73 0.18 -1.58 -0.94 -0.78 -0.77 -0.66 -0.43 0.87
1 1 80 29 372 0.26 -0.73 0.35 0.30 -0.76 2.67 -0.98 1.40
1 1 81 8 127 0.08 0.08 -0.90 -0.29 -0.76 -0.66 0.76 -0.38
1 1 82 13 52 0.18 -1.62 -1.01 -0.79 -0.81 -0.66 0.21 0.47
1 1 83 10 50 0.11 -1.59 -0.90 -0.78 -0.76 -0.66 -0.88 1.26
1 1 84 9 313 0.26 0.84 -0.98 -0.09 1.36 0.45 -0.95 -1.36
1 1 85 5 153 0.19 1.44 0.36 -0.83 -0.48 -0.66 -1.67 -1.79
1 1 86 8 114 0.09 0.12 -0.90 -0.75 -0.76 -0.66 -0.08 -0.37
1 1 87 5 383 0.32 1.59 2.26 1.40 -0.05 -0.66 -0.21 0.65
1 1 88 10 310 0.12 0.51 2.26 0.32 -0.05 -0.66 0.55 0.37
1 1 89 9 277 0.05 0.26 -0.90 0.43 1.36 -0.66 0.04 0.40
1 1 90 13 47 0.09 -0.68 -0.93 -1.18 -1.11 -0.66 0.71 -0.64
1 1 91 15 403 0.19 -1.03 2.26 0.43 -0.48 -0.66 -2.29 2.08
1 1 92 14 123 0.17 0.12 0.23 -0.79 -0.76 -0.66 -0.83 -1.31
1 1 93 29 415 0.31 0.63 2.26 1.41 1.36 0.45 -0.93 1.38
1 1 94 15 62 0.15 0.37 -0.90 -0.70 -0.05 -0.66 -1.62 -1.66
1 1 95 18 12 0.08 -0.39 -0.90 -1.01 -0.76 -0.66 -1.62 -1.66
1 1 96 5 83 0.12 -0.28 -0.97 -0.75 -0.76 0.45 -0.81 -1.31
1 1 97 16 424 0.08 1.32 -0.90 2.23 2.76 0.45 -0.69 0.88
1 1 98 10 231 0.15 0.36 -0.02 -0.55 -0.05 0.45 -0.61 -1.26
1 1 99 7 39 0.15 -1.10 -1.06 -0.92 -0.86 0.45 0.96 -0.93
1 1 100 11 276 0.15 1.55 0.66 0.28 -0.05 -0.66 1.05 -0.08
1 1 101 9 300 0.08 0.71 -0.10 0.43 1.36 -0.66 1.38 0.09
1 1 102 8 325 0.08 0.88 0.66 0.43 1.36 -0.66 1.36 0.08
1 1 103 16 298 0.19 0.84 -1.01 0.42 1.36 0.45 0.57 -0.75
1 1 104 8 286 0.11 1.00 -1.04 0.43 1.36 -0.66 -0.08 -0.37
1 1 105 19 98 0.15 -0.74 -0.98 -0.66 -0.76 0.45 0.59 -0.63
1 1 106 11 416 0.33 3.09 0.66 -0.03 1.36 2.67 0.43 -0.92
1 1 107 5 389 0.03 0.05 -0.10 1.66 1.36 0.45 -0.79 1.42
1 1 108 13 425 0.12 1.34 -0.10 2.23 2.76 0.45 -0.53 0.86
1 1 109 14 260 0.14 0.69 0.66 -0.07 -0.05 0.45 0.53 -0.82
1 1 110 7 190 0.18 -1.02 -0.10 -0.43 -0.76 0.45 -0.07 0.63
1 1 111 11 413 0.08 1.66 0.66 2.23 2.76 -0.66 0.36 0.42
1 1 112 10 14 0.07 -1.05 -0.90 -1.08 -0.90 -0.66 -1.11 -1.38
1 1 113 4 369 0.03 0.74 -0.10 1.16 1.36 0.45 -0.22 0.65
1 1 114 13 254 0.11 0.67 -1.16 0.36 1.36 -0.66 0.37 -0.88
1 1 115 25 199 0.2 -0.97 0.66 -0.73 -0.76 -0.66 -0.65 1.03
1 1 116 10 76 0.11 -0.45 -0.10 -1.14 -1.04 -0.66 0.35 -1.00
1 1 117 13 148 0.22 -0.73 -0.71 -0.32 -0.73 2.67 0.36 0.41
1 1 118 13 97 0.1 -0.33 0.66 -1.18 -1.11 -0.66 0.77 -0.72
1 1 119 4 327 0.39 0.72 -0.90 2.80 -0.23 0.17 0.73 -0.17
1 1 120 9 3 0.13 -1.24 -1.27 -1.00 -0.84 -0.66 -1.29 -1.44
1 1 121 3 391 0.04 1.03 2.26 0.17 -0.05 2.67 0.69 0.27
1 1 122 11 448 0.24 0.82 2.26 3.28 2.76 0.45 -2.42 2.27
1 1 123 6 213 0.18 0.56 0.73 -0.29 -0.76 -0.66 0.85 0.25
1 1 124 3 409 0.06 2.88 2.26 1.19 -0.05 0.45 0.06 0.53
1 1 125 6 249 0.21 0.87 0.66 -0.11 -0.05 -0.66 0.01 -0.09
1 1 126 5 228 0.06 0.44 -0.10 0.11 -0.05 -0.66 -0.08 -0.37
1 1 127 11 93 0.11 0.20 -1.07 -0.89 -0.76 -0.66 0.67 -0.55
1 1 128 14 423 0.26 -0.71 2.26 1.97 -0.20 -0.66 -2.23 2.01
1 1 129 10 282 0.1 0.23 0.66 0.43 -0.05 0.45 0.69 0.27
1 1 130 9 113 0.18 -1.15 -1.06 -0.29 -0.05 -0.66 1.00 -0.04
1 1 131 7 170 0.21 -0.79 -0.21 -0.12 -0.66 0.45 0.82 -0.07
1 1 132 14 340 0.18 1.72 0.66 0.33 1.36 -0.66 0.28 -0.80
1 1 133 18 126 0.2 -0.42 -0.10 -0.84 -0.86 0.45 0.61 -0.79
1 1 134 10 357 0.22 0.01 0.66 2.23 -0.05 -0.66 -0.39 0.87
1 1 135 19 292 0.28 0.62 -0.10 -0.23 -0.16 2.67 0.65 -0.53
1 1 136 24 261 0.18 -0.07 0.61 0.09 -0.05 0.45 1.46 0.11
1 1 137 11 433 0.25 2.03 0.52 2.23 2.76 0.45 -0.16 0.64
1 1 138 16 90 0.12 -0.42 -0.97 -0.82 -0.76 -0.66 0.77 -0.17
1 1 139 22 431 0.16 1.08 0.66 2.23 2.76 0.45 -0.81 1.19
1 1 140 8 48 0.07 -0.78 -0.99 -1.18 -1.11 -0.66 -0.08 -0.37
1 1 141 9 189 0.09 -1.52 -0.90 0.04 -0.76 0.45 -0.59 0.95
1 1 142 4 124 0.05 0.08 -0.90 -0.29 -0.76 -0.66 1.36 0.08
1 1 143 14 307 0.34 -0.19 0.50 -0.30 -0.41 2.67 0.26 0.48
1 1 144 13 381 0.22 2.95 0.66 0.16 -0.05 -0.41 -1.68 -1.77
1 1 145 9 36 0.09 -1.48 -1.02 -1.01 -0.76 -0.66 0.34 -0.96
1 1 146 4 215 0.18 -1.07 -0.10 -0.41 -0.05 -0.66 -0.59 0.91
1 1 147 8 255 0.08 0.45 -1.27 0.43 1.36 -0.66 0.72 -0.24
1 1 148 20 223 0.24 -0.17 -0.90 0.29 -0.05 -0.66 -0.21 0.60
1 1 149 7 305 0.14 -1.34 0.66 -0.26 -0.56 0.45 -1.05 1.49
1 1 150 9 337 0.13 1.61 0.66 0.44 1.36 -0.66 0.93 -0.89
1 1 151 9 322 0.13 0.06 0.66 0.39 -0.05 0.45 -0.67 1.03
1 1 152 7 178 0.13 0.10 0.66 -0.20 -0.76 -0.66 0.61 -0.73
1 1 153 13 278 0.14 -0.35 0.66 0.16 -0.05 0.45 -0.03 0.54
1 1 154 8 82 0.16 -0.91 -0.99 -0.89 -0.89 0.45 -0.08 -0.37
1 1 155 11 42 0.06 -0.72 -0.90 -1.17 -0.98 -0.66 1.02 -1.00
1 1 156 12 316 0.13 -0.83 2.26 -0.31 -0.76 -0.66 -1.00 1.46
1 1 157 7 252 0.09 0.18 -1.27 0.43 1.36 -0.66 1.27 0.05
1 1 158 16 150 0.13 0.23 -0.97 -0.42 -0.05 -0.66 0.36 -1.07
1 1 159 14 66 0.2 -0.59 0.12 -1.01 -0.88 -0.66 -0.75 -1.29
1 1 160 7 224 0.15 -0.18 2.26 -0.83 -0.86 0.45 1.26 0.11
1 1 161 7 331 0.18 0.35 0.78 0.11 1.36 0.45 1.14 0.13
1 1 162 14 356 0.1 -0.78 1.08 0.48 -0.05 0.45 -0.99 1.47
1 1 163 13 291 0.19 0.96 -1.01 -0.29 1.36 -0.66 -1.61 -1.64
1 1 164 12 367 0.19 0.01 0.66 0.18 -0.11 2.67 -0.54 0.98
1 1 165 13 290 0.1 0.23 0.66 0.35 -0.05 0.45 0.19 0.49
1 1 166 13 420 0.08 0.88 -0.90 2.23 2.76 0.45 -0.65 1.06
1 1 167 18 4 0.12 -0.73 -1.11 -1.03 -0.76 -0.66 -1.65 -1.72
1 1 168 11 397 0.07 -0.20 0.97 0.49 -0.05 2.67 -1.22 1.60
1 1 169 2 121 0.08 1.15 -0.50 -0.68 -0.76 -0.66 -0.61 -1.26
1 1 170 12 376 0.12 -0.51 2.26 0.48 -0.05 0.45 -1.08 1.49
1 1 171 7 23 0.08 -1.43 -1.27 -1.17 -1.01 -0.66 0.62 -0.17
1 1 172 9 162 0.14 0.02 -0.10 -0.15 -0.76 -0.66 0.84 -0.22
1 1 173 5 167 0.11 -0.70 0.06 -0.51 -0.76 0.45 -0.08 -0.37
1 1 174 10 158 0.09 -0.14 -1.16 0.04 -0.05 -0.66 0.39 -0.77
1 1 175 21 177 0.11 0.13 -0.97 0.04 -0.05 -0.66 0.80 -0.28
1 1 176 4 404 0.09 -0.22 2.26 2.23 -0.05 -0.66 -1.04 1.48
1 1 177 7 285 0.15 -0.60 -0.10 0.38 -0.05 0.45 -0.49 0.93
1 1 178 9 110 0.16 -0.29 -0.81 -0.68 -0.05 -0.66 -0.89 -1.33
1 1 179 13 394 0.44 2.96 0.66 0.21 -0.11 2.67 0.86 -0.39
1 1 180 4 359 0.09 -0.89 2.26 -0.29 -0.76 0.45 -1.04 1.48
1 1 181 2 393 0.02 1.02 2.26 0.01 -0.05 2.67 0.04 0.40
1 1 182 3 314 0.05 1.34 -0.10 0.43 1.36 -0.66 -0.08 -0.37
1 1 183 23 348 0.18 1.15 0.70 0.25 1.36 0.45 1.22 -0.03
1 1 184 11 317 0.19 1.58 0.66 -0.49 -0.37 0.45 -1.64 -1.70
1 1 185 12 77 0.14 -1.08 0.22 -1.03 -0.76 -0.66 0.34 -0.98
1 1 186 9 111 0.1 -1.68 -0.90 -0.25 -0.76 -0.66 -0.77 0.98
1 1 187 9 312 0.16 1.87 0.66 0.25 -0.05 0.45 1.03 -0.10
1 1 188 18 333 0.07 -0.81 0.66 0.44 -0.05 0.45 -0.85 1.26
1 1 189 8 390 0.4 1.87 0.08 3.65 -0.14 -0.53 0.68 -0.88
1 1 190 12 355 0.17 0.40 -0.90 1.04 1.36 0.45 -0.08 0.59
1 1 191 9 99 0.14 -0.97 -0.10 -0.65 -0.76 -0.66 1.26 0.03
1 1 192 3 447 0.51 2.12 1.19 6.45 1.36 0.45 -0.07 0.82
1 1 193 12 250 0.12 -0.62 0.66 -0.13 -0.05 0.45 0.39 0.41
1 1 194 8 358 0.2 0.51 0.66 0.38 1.36 0.45 -0.37 0.64
1 1 195 15 315 0.15 -0.89 0.66 0.31 -0.05 -0.66 -1.23 1.63
1 1 196 8 256 0.09 0.50 1.08 0.13 -0.05 -0.66 1.41 0.10
1 1 197 14 145 0.12 0.19 0.66 -0.91 -0.81 -0.66 0.65 -0.62
1 1 198 22 21 0.15 -0.18 -0.10 -0.99 -0.77 -0.66 -1.64 -1.71
1 1 199 17 180 0.11 0.34 -0.92 0.04 -0.05 -0.66 0.48 -0.69
1 1 200 11 438 0.14 0.15 2.26 2.23 1.36 0.45 -2.30 2.03
1 1 201 11 1 0.15 -1.35 -1.24 -1.17 -0.95 -0.66 -1.66 -1.75
1 1 202 3 198 0.17 0.75 -0.36 -0.46 -0.76 0.45 -0.78 -1.30
1 1 203 11 128 0.11 -0.95 -0.90 -0.08 -0.76 -0.66 0.02 0.42
1 1 204 9 30 0.14 -1.44 -1.19 -1.15 -0.87 -0.66 -0.08 -0.37
1 1 205 7 293 0.17 -1.58 0.12 0.48 -0.46 -0.66 -1.08 1.49
1 1 206 10 26 0.15 -1.31 -1.12 -1.17 -1.00 0.45 0.88 -0.15
1 1 207 3 432 0.06 2.12 2.26 2.23 1.36 0.45 1.29 0.06
1 1 208 9 10 0.09 -1.32 -1.27 -1.12 -0.84 -0.66 -0.61 -1.26
1 1 209 13 273 0.09 1.05 -1.01 0.43 1.36 -0.66 0.60 -0.56
1 1 210 16 209 0.16 -0.22 0.66 -0.37 -0.76 -0.66 -0.01 0.56
1 1 211 11 246 0.18 -1.02 0.66 -0.45 -0.57 0.45 -0.46 0.87
1 1 212 7 7 0.09 -1.63 -1.27 -1.12 -0.91 -0.66 1.57 0.14
1 1 213 13 375 0.29 1.29 0.37 -0.67 -0.27 2.67 -1.37 -1.58
1 1 214 10 176 0.05 -0.42 -0.90 0.04 -0.05 -0.66 0.69 0.27
1 1 215 9 341 0.12 0.55 2.26 0.15 -0.05 0.45 0.04 0.40
1 1 216 11 251 0.16 0.66 0.25 -0.26 -0.05 0.45 -0.08 -0.37
1 1 217 8 44 0.06 -0.58 -0.90 -1.18 -1.11 -0.66 0.35 -1.03
1 1 218 12 143 0.16 -1.35 -0.10 -0.68 -0.76 -0.66 -0.54 0.95
1 1 219 10 57 0.07 -0.51 -1.27 -0.88 -0.76 -0.66 0.35 -1.03
1 1 220 17 446 0.48 0.62 1.39 2.40 1.85 2.67 -2.30 2.11
1 1 221 10 130 0.1 -0.47 -0.90 -0.68 -0.05 0.45 0.69 -1.04
1 1 222 10 11 0.24 0.08 -1.16 -0.99 -0.76 2.67 0.77 -0.03
1 1 223 3 172 0.11 -0.84 0.16 -0.29 -0.05 -0.66 1.43 0.10
1 1 224 25 210 0.17 0.58 -0.10 0.00 -0.05 -0.66 0.61 -0.80
1 1 225 22 206 0.16 -1.40 -0.10 -0.24 -0.73 -0.66 -0.95 1.40
1 1 226 7 398 0.05 -0.02 -0.10 1.66 1.36 0.45 -1.22 1.60
1 1 227 8 201 0.14 -0.78 0.66 -0.56 -0.76 0.45 0.43 0.37
1 1 228 7 29 0.08 -1.12 -0.90 -1.18 -1.11 -0.66 1.27 0.05
1 1 229 15 41 0.23 0.74 -0.95 -0.67 -0.76 2.67 0.95 -0.01
1 1 230 3 321 0.12 0.57 -0.36 0.43 1.36 0.45 1.43 0.10
1 1 231 11 360 0.26 1.13 0.66 0.22 -0.05 2.67 0.58 -0.17
1 1 232 12 434 0.14 -0.11 0.66 2.30 1.36 0.45 -2.35 2.06
1 1 233 8 222 0.19 0.65 -0.10 -0.17 -0.05 -0.66 -0.86 -1.32
1 1 234 12 15 0.08 -0.14 -0.90 -0.75 -0.76 -0.66 -1.68 -1.78
1 1 235 5 71 0.05 -0.78 -0.97 -1.10 -0.76 -0.66 -0.08 -0.37
1 1 236 18 395 0.31 -0.96 0.58 1.94 -0.25 -0.66 -2.19 1.98
1 1 237 9 361 0.27 1.32 0.41 -0.40 1.36 0.45 -1.09 -1.40
1 1 238 6 412 0.08 -0.47 0.66 0.47 -0.05 2.67 -2.18 1.93
1 1 239 11 205 0.21 -0.84 -0.90 -0.29 -0.79 2.67 -0.40 0.85
1 1 240 9 335 0.31 2.27 0.66 0.27 -0.13 -0.54 -0.83 -1.31
1 1 241 9 218 0.16 -0.10 0.66 -0.23 -0.05 -0.66 0.69 0.09
1 1 242 9 64 0.07 -0.98 -0.10 -1.18 -1.11 -0.66 0.77 -0.21
1 1 243 4 380 0.03 0.18 -0.10 1.16 1.36 0.45 -0.93 1.29
1 1 244 12 235 0.09 0.14 0.66 0.10 -0.05 -0.66 1.36 0.08
1 1 245 11 109 0.09 -1.48 -0.90 -0.29 -0.76 -0.66 -0.29 0.75
1 1 246 3 6 0.07 -2.03 -0.90 -1.29 -1.11 -0.66 0.57 -1.00
1 1 247 6 408 0.27 2.69 -0.90 -0.28 1.36 2.67 0.18 -1.15
1 1 248 17 287 0.23 -0.82 -0.10 0.39 -0.05 -0.66 -1.02 1.48
1 1 249 12 56 0.11 -0.40 -0.93 -0.89 -0.76 -0.66 -0.61 -1.26
1 1 250 6 319 0.09 -1.07 0.66 0.34 -0.76 0.45 -1.03 1.50
1 1 251 21 80 0.08 -0.60 -0.90 -0.78 -0.76 -0.66 0.73 -0.66
1 1 252 17 378 0.11 0.51 0.66 1.19 1.36 0.45 -0.59 0.91
1 1 253 9 108 0.16 -0.85 -0.10 -0.93 -0.91 0.45 0.89 -0.18
1 1 254 8 366 0.15 -0.56 0.47 2.23 -0.05 -0.66 -0.96 1.39
1 1 255 17 214 0.11 0.55 0.66 -0.71 -0.05 -0.66 -1.62 -1.65
1 1 256 13 343 0.22 0.56 0.54 0.03 -0.05 2.67 1.22 0.03
1 1 257 12 330 0.07 0.50 2.26 0.17 -0.05 0.45 0.69 0.27
1 1 258 9 308 0.11 0.78 2.26 0.17 -0.05 -0.66 1.29 0.06
1 1 259 11 435 0.29 0.77 1.28 1.39 1.36 2.67 -0.95 1.35
1 1 260 16 96 0.14 -0.63 0.77 -1.18 -1.11 -0.66 0.96 -0.08
1 1 261 11 133 0.13 -1.24 -0.90 -0.78 -0.76 0.45 -0.58 0.94
1 1 262 21 370 0.19 -1.10 0.66 0.44 -0.05 -0.66 -2.22 2.01
1 1 263 11 347 0.12 -0.94 0.52 0.46 -0.05 0.45 -1.22 1.59
1 1 264 1 302 0 -0.06 2.26 0.04 -0.76 0.45 0.04 0.40
1 1 265 24 445 0.32 0.66 0.69 3.06 2.76 0.45 -2.27 2.09
1 1 266 9 140 0.12 -0.77 -0.10 -0.09 -0.76 -0.66 0.82 -0.15
1 1 267 15 353 0.21 -0.13 2.26 0.12 -0.05 -0.66 -0.87 1.31
1 1 268 3 449 0.04 5.42 0.66 3.00 4.17 2.67 0.68 -0.62
1 1 269 21 304 0.15 1.41 -0.10 0.38 1.36 -0.66 0.61 -0.86
1 1 270 7 204 0.11 -0.08 -1.00 0.04 -0.05 0.45 0.76 -0.11
1 1 271 10 137 0.13 0.19 0.66 -0.85 -0.69 -0.66 1.02 -1.00
1 1 272 8 275 0.06 1.21 -0.90 0.45 1.36 -0.66 0.92 -0.88
1 1 273 15 417 0.16 1.47 -0.90 2.23 2.76 0.45 -0.07 0.59
1 1 274 16 70 0.07 -0.67 -0.90 -0.90 -0.76 -0.66 0.35 -1.00
1 1 275 6 17 0.1 0.50 -0.90 -0.85 -0.76 -0.66 -1.66 -1.75
1 1 276 11 196 0.15 -0.32 -0.90 0.44 -0.12 -0.66 0.50 0.41
1 1 277 7 142 0.1 -0.48 -0.90 -0.25 -0.05 -0.66 0.64 -0.60
1 1 278 6 200 0.08 0.67 -0.90 -0.56 -0.76 0.45 0.10 0.48
1 1 279 11 388 0.23 0.12 0.66 1.20 -0.05 2.67 -0.59 1.01
1 1 280 15 422 0.46 0.90 -0.68 1.48 1.73 2.67 -0.62 0.96
1 1 281 14 284 0.24 1.26 0.66 0.22 -0.10 0.45 0.72 -0.80
1 1 282 15 405 0.23 0.11 0.72 1.72 1.36 0.45 -1.05 1.53
1 1 283 5 9 0.14 -0.59 -0.97 -0.82 -0.76 0.45 -1.67 -1.72
1 1 284 6 414 0.2 1.39 2.26 0.30 1.36 2.67 0.62 0.23
1 1 285 19 247 0.26 0.55 -0.90 -0.20 -0.05 2.67 0.69 -0.81
1 1 286 10 227 0.16 0.65 -0.10 -0.69 -0.33 0.45 -1.61 -1.64
1 1 287 4 440 0.64 3.73 1.86 1.45 0.30 2.67 -0.12 0.33
1 1 288 4 120 0.13 -0.38 0.66 -1.09 -1.02 0.45 0.81 -0.97
1 1 289 8 173 0.14 -0.32 -0.10 -0.40 -0.05 -0.66 0.83 -0.22
1 1 290 4 396 0.11 0.97 2.26 0.16 -0.05 2.67 1.57 0.14
1 1 291 14 163 0.14 -1.00 0.66 -0.67 -0.76 -0.66 0.17 0.53
1 1 292 18 101 0.17 -0.63 0.75 -0.91 -0.87 -0.66 1.55 0.14
1 1 293 5 207 0.1 -0.11 2.26 -0.77 -0.76 -0.66 0.49 0.42
1 1 294 5 135 0.11 -0.53 0.66 -1.18 -1.11 0.45 0.54 -0.37
1 1 295 17 46 0.08 -1.00 -1.27 -0.97 -0.76 -0.66 0.73 -0.64
1 1 296 8 311 0.2 1.35 -0.90 -0.56 -0.76 2.67 0.02 0.53
1 1 297 11 16 0.22 -0.33 -1.03 -0.59 -0.76 2.67 0.71 -0.68
1 1 298 6 402 0.22 1.92 0.66 0.17 1.36 2.67 0.30 -0.05
1 1 299 15 385 0.08 0.36 0.66 1.18 1.36 0.45 -0.88 1.25
1 1 300 2 103 0.06 -0.30 -0.90 0.04 -0.76 -0.66 -0.86 -1.32
1 1 301 4 181 0.17 -0.18 0.98 -0.88 -0.85 0.45 1.46 0.11
1 1 302 11 379 0.31 -1.15 0.59 0.31 -0.31 0.45 -2.30 2.09
1 1 303 2 426 0.04 3.01 0.66 2.23 1.36 -0.66 -0.86 -1.32
1 1 304 10 91 0.17 -0.72 -0.97 -0.79 -0.05 -0.66 0.58 -0.94
1 1 305 15 281 0.17 -1.28 -0.10 0.27 -0.76 0.45 -0.93 1.31
1 1 306 4 243 0.13 -1.05 0.66 -0.48 -0.05 -0.66 -0.70 0.97
1 1 307 5 154 0.1 -0.24 -0.10 -0.17 -0.76 -0.66 1.57 0.14
1 1 308 18 377 0.37 1.02 2.26 0.59 1.36 -0.42 0.25 0.45
1 1 309 9 58 0.07 -0.84 -1.27 -0.80 -0.76 -0.66 0.44 -0.71
1 1 310 8 119 0.12 -1.30 -0.10 -0.51 -0.76 -0.66 0.39 0.35
1 1 311 6 342 0.16 0.04 -0.76 -0.78 -0.05 2.67 -0.77 1.16
1 1 312 10 74 0.06 -1.09 -0.90 -0.78 -0.76 -0.66 0.74 -0.16
1 1 313 13 38 0.11 -1.39 -0.98 -0.75 -0.76 -0.66 1.57 0.14
1 1 314 3 244 0.04 0.33 -0.90 0.47 -0.05 0.45 0.69 0.27
1 1 315 9 24 0.09 -1.08 -0.90 -1.07 -0.91 -0.66 -0.61 -1.26
1 1 316 14 185 0.16 -0.53 -0.95 -0.01 -0.05 0.45 1.43 0.09
1 1 317 13 164 0.16 -0.81 -0.10 -0.67 -0.76 -0.66 -0.52 0.83
1 1 318 12 2 0.15 -1.02 -0.70 -1.27 -1.11 -0.66 -1.60 -1.64
1 1 319 12 35 0.16 -1.00 -1.02 -1.18 -1.11 0.45 0.57 -0.78
1 1 320 8 144 0.06 -0.18 -1.27 0.04 -0.05 -0.66 0.92 -0.88
1 1 321 8 149 0.15 -1.32 -0.90 -0.25 -0.76 0.45 -0.05 0.54
1 1 322 6 272 0.07 0.52 -0.90 0.43 1.36 -0.66 1.36 0.08
1 1 323 10 53 0.09 -0.61 -1.27 -0.81 -0.76 -0.66 0.92 -0.88
1 1 324 5 439 0.05 0.99 2.26 2.23 2.76 0.45 -0.82 1.39
1 1 325 17 229 0.17 0.09 -0.10 -0.12 -0.05 0.45 0.77 -0.21
1 1 326 6 161 0.14 0.73 -0.90 -0.60 -0.05 -0.66 0.56 -0.66
1 1 327 6 364 0.14 -0.28 0.28 2.23 -0.05 0.45 -0.51 0.97
1 1 328 20 257 0.14 -0.62 0.66 0.03 -0.05 -0.66 -0.44 0.81
1 1 329 11 299 0.11 1.07 -0.10 0.43 1.36 -0.66 0.78 -0.24
1 1 330 8 230 0.09 -0.35 -0.90 0.40 -0.05 0.45 0.56 0.37
1 1 331 7 410 0.15 -0.93 2.26 0.44 -0.35 0.45 -2.35 2.13
1 1 332 12 202 0.16 0.09 -0.90 0.19 -0.11 0.45 0.75 -0.69
1 1 333 18 436 0.32 2.11 2.26 2.23 2.69 -0.54 0.34 0.39
1 1 334 1 371 0 -0.38 0.66 0.47 1.36 -0.66 -1.24 1.67
1 1 335 5 232 0.16 -0.54 -0.74 -0.78 -0.05 0.45 -0.76 1.16
1 1 336 12 69 0.06 -0.69 -0.90 -1.13 -0.76 -0.66 0.67 -0.60
1 1 337 14 141 0.17 0.63 -0.10 -0.75 -0.15 -0.66 -1.65 -1.71
1 1 338 9 362 0.18 -1.37 0.58 0.43 -0.76 -0.66 -2.19 1.94
1 1 339 11 106 0.16 -0.83 -0.93 -0.15 -0.76 -0.66 0.69 -0.05
1 1 340 8 184 0.1 -0.03 -0.99 0.43 -0.05 -0.66 0.77 -0.30
1 1 341 13 160 0.11 -0.33 -0.98 -0.07 -0.05 -0.66 1.14 0.02
1 1 342 11 336 0.27 0.81 -0.75 -0.65 -0.18 2.67 -1.21 -1.49
1 1 343 11 81 0.1 -0.08 -0.90 -1.00 -0.76 -0.66 0.35 -1.01
1 1 344 14 55 0.12 -1.50 -0.90 -0.73 -0.76 -0.66 0.87 -0.06
1 1 345 7 208 0.2 0.31 -0.90 -0.15 -0.05 0.45 0.16 -0.73
1 1 346 6 269 0.19 0.67 0.66 -0.66 -0.17 0.45 -1.50 -1.58
1 1 347 5 31 0.11 -1.74 -1.27 -0.86 -0.76 -0.66 0.81 0.04
1 1 348 5 186 0.11 -0.23 -1.27 0.04 -0.05 0.45 0.46 -0.62
1 1 349 5 329 0.08 -0.30 -0.90 2.23 -0.05 -0.66 -0.77 0.98
1 1 350 11 118 0.17 0.06 -0.10 -0.77 -0.76 -0.66 0.86 -0.81
1 1 351 7 354 0.08 1.23 1.08 0.43 1.36 -0.66 1.39 0.09
1 1 352 11 132 0.16 -0.22 0.32 -0.97 -0.92 -0.66 -0.08 -0.37
1 1 353 10 266 0.19 1.14 0.66 -0.40 -0.05 0.45 0.92 -0.24
1 1 354 9 437 0.12 0.15 2.26 1.66 1.36 0.45 -2.36 2.22
1 1 355 19 60 0.11 -1.01 -0.92 -0.84 -0.76 -0.66 1.37 0.08
1 1 356 12 51 0.13 -1.11 -0.90 -0.77 -0.76 -0.66 0.94 -0.90
1 1 357 13 86 0.06 -0.32 -0.90 -0.77 -0.76 -0.66 0.35 -1.01
1 1 358 13 92 0.11 -0.17 -0.95 -0.85 -0.76 -0.66 0.57 -0.58
1 1 359 20 363 0.62 1.96 0.66 0.32 1.50 0.45 0.44 -0.55
1 1 360 12 88 0.08 -0.58 -1.09 -0.73 -0.76 -0.66 -0.08 -0.37
1 1 361 17 318 0.29 0.76 0.66 -0.26 -0.18 2.67 0.65 -0.78
1 1 362 10 68 0.12 -0.94 -0.10 -1.03 -0.79 -0.66 0.94 -0.90
1 1 363 9 182 0.15 -0.05 0.66 -0.74 -0.76 0.45 0.29 -0.79
1 1 364 7 147 0.16 -0.82 0.55 -0.83 -0.86 0.45 1.39 0.09
1 1 365 9 271 0.16 0.13 0.49 -0.36 -0.05 -0.66 -0.77 1.13
1 1 366 7 323 0.33 0.00 0.66 1.39 -0.05 0.45 0.24 0.53
1 1 367 17 102 0.11 -0.99 -0.90 -0.77 -0.76 -0.66 -0.31 0.77
1 1 368 10 40 0.07 -0.47 -0.97 -0.88 -0.76 -0.66 -1.11 -1.38
1 1 369 11 75 0.1 -0.41 -0.10 -1.18 -1.11 -0.66 0.77 -0.70
1 1 370 9 264 0.07 0.84 -0.90 0.45 1.36 -0.66 0.91 -0.86
1 1 371 12 85 0.1 -0.56 -0.90 -0.77 -0.76 -0.66 1.25 0.05
1 1 372 9 419 0.35 3.53 -0.11 2.49 1.36 -0.66 0.92 -0.83
1 1 373 24 248 0.19 -1.30 0.66 -0.27 -0.73 -0.66 -1.07 1.49
1 1 374 10 152 0.13 -1.21 -0.90 0.08 -0.76 -0.66 -0.58 0.92
1 1 375 9 151 0.26 -1.36 -1.06 -0.21 -0.05 -0.66 -0.14 0.64
1 1 376 7 406 0.18 -0.82 0.55 1.66 -0.05 0.45 -2.26 2.02
1 1 377 10 274 0.1 0.93 -0.97 0.43 1.36 -0.66 0.91 -0.23
1 1 378 9 267 0.12 -0.58 -0.90 0.43 -0.05 0.45 -0.54 0.92
1 1 379 17 171 0.11 0.25 -0.90 0.04 -0.05 -0.66 0.93 -0.88
1 1 380 18 78 0.09 -0.77 -0.90 -0.91 -0.76 -0.66 0.81 -0.19
1 1 381 11 279 0.27 2.30 0.45 0.26 -0.12 -0.66 0.68 -0.96
1 1 382 11 233 0.14 0.60 0.66 -0.46 -0.05 -0.66 -0.84 -1.31
1 1 383 9 131 0.14 -0.30 -0.90 -0.67 -0.60 -0.66 0.15 0.48
1 1 384 5 384 0.2 1.17 0.66 1.38 1.36 0.45 -0.34 0.68
1 1 385 11 401 0.2 2.01 0.66 0.04 1.36 2.67 0.93 -0.37
1 1 386 11 280 0.13 0.82 -1.03 0.14 1.36 -0.66 -0.84 -1.31
1 1 387 13 146 0.09 0.20 0.66 -0.81 -0.76 -0.66 0.35 -1.03
1 1 388 7 8 0.14 -1.20 -1.06 -1.16 -1.01 0.45 -0.75 -1.29
1 1 389 7 344 0.22 0.72 2.26 -0.41 -0.46 -0.66 -0.74 1.16
1 1 390 6 443 0.04 1.12 2.26 2.23 2.76 0.45 -1.22 1.59
1 1 391 10 399 0.28 1.46 0.66 2.36 1.36 0.34 0.53 0.35
1 1 392 5 219 0.2 1.96 -0.90 0.05 -0.05 -0.44 0.59 -0.82
1 1 393 12 258 0.12 -0.21 0.66 0.50 -0.05 -0.66 0.30 0.50
1 1 394 12 125 0.11 0.21 -0.10 -0.76 -0.76 -0.66 0.36 -0.98
1 1 395 13 226 0.17 -0.86 0.66 0.08 -0.76 -0.66 -0.24 0.70
1 1 396 22 288 0.2 -0.36 0.66 0.54 -0.05 -0.66 -0.56 0.93
1 1 397 16 100 0.12 -0.35 0.66 -1.04 -0.93 -0.66 0.35 -1.02
1 1 398 14 115 0.17 -1.16 -0.90 -0.56 -0.78 0.45 0.44 0.37
1 1 399 10 212 0.16 0.03 -0.10 0.15 -0.05 -0.66 1.20 0.02
1 1 400 12 365 0.17 -0.54 2.26 0.42 -0.05 -0.66 -1.04 1.52
1 1 401 17 387 0.23 1.48 2.26 0.43 1.36 -0.01 1.34 0.10
1 1 402 13 122 0.11 -0.39 -0.10 -0.78 -0.76 -0.66 0.80 -0.07
1 1 403 15 22 0.15 -1.30 -1.17 -1.07 -0.71 -0.66 1.02 -1.00
1 1 404 8 220 0.13 0.85 0.66 -0.22 -0.23 -0.66 1.02 -1.00
1 1 405 18 236 0.11 0.74 0.66 0.08 -0.05 -0.66 0.60 -0.69
1 1 406 8 169 0.23 -0.49 0.66 -0.18 -0.67 -0.66 0.44 -0.60
1 1 407 14 20 0.11 -1.48 -1.27 -1.17 -0.96 -0.66 1.06 -0.09
1 1 408 9 263 0.27 0.04 -0.90 0.09 -0.05 2.67 0.75 0.14
1 1 409 10 65 0.12 -1.26 -0.90 -0.71 -0.76 -0.66 0.63 -0.54
1 1 410 14 259 0.14 -0.35 -0.10 0.43 -0.05 -0.66 -0.50 0.87
1 1 411 6 159 0.18 0.92 0.03 -0.71 -0.76 -0.66 0.23 -0.71
1 1 412 4 245 0.05 1.02 0.66 -0.29 -0.05 -0.66 1.36 0.08
1 1 413 13 84 0.07 -0.97 -0.93 -0.79 -0.76 -0.66 0.69 0.27
1 1 414 16 211 0.17 0.58 -0.10 -0.20 -0.05 -0.66 0.94 -0.12
1 1 415 8 392 0.24 0.86 0.77 0.24 1.36 2.67 0.18 0.45
1 1 416 14 382 0.13 -0.31 0.84 0.45 -0.05 2.67 -0.83 1.35
1 1 417 24 33 0.11 -1.46 -1.24 -1.02 -0.76 -0.66 0.72 -0.62
1 1 418 13 59 0.1 -1.08 -1.27 -0.76 -0.76 -0.66 0.76 -0.17
1 1 419 18 430 0.17 0.70 -0.01 2.23 2.76 0.45 -1.06 1.50
1 1 420 11 221 0.12 0.04 0.66 -0.54 -0.05 0.45 0.75 -1.00
1 1 421 11 242 0.18 1.16 0.66 -0.57 -0.37 0.45 0.66 -0.82
1 1 422 6 191 0.16 -0.14 0.73 0.04 -0.76 -0.66 0.93 -0.11
1 1 423 3 421 0.06 -0.07 0.66 2.04 1.36 0.45 -1.97 1.80
1 1 424 31 297 0.17 -0.82 0.66 0.31 -0.05 -0.66 -0.81 1.24
1 1 425 10 374 0.1 0.47 -0.10 1.16 1.36 0.45 -0.58 0.92
1 1 426 15 418 0.19 0.04 2.26 0.88 -0.05 2.67 -1.04 1.46
1 1 427 10 324 0.24 0.44 -0.90 1.38 1.36 -0.66 0.29 0.50
1 1 428 7 129 0.14 0.81 -0.44 -0.46 -0.76 -0.66 1.02 -1.00
1 1 429 12 156 0.15 0.31 -0.96 -0.33 -0.05 -0.66 -0.86 -1.32
1 1 430 5 441 0.06 1.48 2.26 2.23 2.76 0.45 -0.91 1.32
1 1 431 8 373 0.15 -0.91 0.98 1.32 -0.05 0.45 -1.10 1.51
1 1 432 13 332 0.13 0.64 0.66 0.60 1.36 -0.66 0.36 0.38
1 1 433 13 187 0.14 0.08 -1.01 0.10 -0.05 -0.66 -0.08 -0.37
1 1 434 5 174 0.07 -0.69 -0.90 -0.78 -0.05 -0.66 -0.74 1.14
1 1 435 33 338 0.24 1.42 0.66 -0.09 1.36 0.45 0.56 -0.78
1 1 436 18 79 0.13 -0.16 -0.90 -0.93 -0.76 -0.66 0.92 -0.88
1 1 437 20 104 0.11 -1.07 -0.90 -0.77 -0.76 -0.66 -0.76 0.91
1 1 438 11 334 0.13 1.49 0.66 0.43 1.36 -0.66 0.76 -0.34
1 1 439 8 350 0.26 2.70 -0.40 0.08 -0.05 -0.66 -1.57 -1.65
1 1 440 15 238 0.14 0.67 0.66 -0.02 -0.05 -0.66 0.79 -0.18
1 1 441 3 400 0.07 -0.38 2.26 1.66 -0.05 0.45 -0.99 1.46
1 1 442 10 283 0.27 1.07 0.66 -0.37 -0.19 0.45 -0.86 -1.32
1 1 443 15 95 0.1 -1.48 -0.90 -0.29 -0.76 -0.66 0.40 0.42
1 1 444 3 265 0.03 -0.02 2.26 0.04 -0.76 -0.66 0.04 0.40
1 1 445 29 368 0.12 0.31 -0.90 1.16 1.36 0.45 -0.66 1.00
1 1 446 4 309 0.06 -0.16 -0.90 2.23 -0.05 -0.66 -0.32 0.78
1 1 447 9 116 0.12 -0.17 -0.94 -0.63 -0.05 -0.66 1.00 -0.97
1 1 448 10 139 0.17 -0.94 0.66 -0.50 -0.76 -0.66 0.68 -0.31
1 1 449 8 25 0.07 -0.12 2.26 -1.18 -1.11 -0.66 1.30 0.06
1 1 450 12 193 0.14 0.63 0.66 -0.63 -0.76 -0.66 -0.07 -0.30

Now let us understand what each column in the above summary table means:

All the columns after this will contain centroids for each cell. They can also be called a codebook, which represents a collection of all centroids or codewords.

plotHVT(heatmap = '1D')

Figure 17: Sammons 1D x Cell ID plot for layer 1 shown for the 450 cells in the dataset ’computers’

7.2 Step 2: Data Projection

For more detailed information on Data Projection please refer to section 3 of this vignette.

lets view the projected 2D centroids after performing sammon’s projection on the compressed data (450 cells) recieved after performing vector quantization. For the sake of brevity we are displaying first six rows.

hvt_torus_coordinates <-hvt.results[[2]][[1]][["1"]]
centroids <<- list()
  coordinates_value <- lapply(1:length(hvt_torus_coordinates), function(x){
    centroids <-hvt_torus_coordinates[[x]]
    coordinates <- centroids$pt
  })
centroid_coordinates<<- do.call(rbind.data.frame, coordinates_value)  
colnames(centroid_coordinates) <- c("x_coord","y_coord")
centroid_coordinates$Row.No <- as.numeric(row.names(centroid_coordinates)) 
centroid_coordinates <- centroid_coordinates %>% dplyr::select(Row.No,x_coord,y_coord)
centroid_coordinates <- centroid_coordinates %>% data.frame() %>% round(4)
Table(head(centroid_coordinates))
Row.No x_coord y_coord
1 13.9788 16.5238
2 4.4488 -6.6470
3 -6.8752 7.8261
4 -16.8254 10.1232
5 -4.2905 -12.9163
6 -20.1195 -3.9293

Lets visualize the projected Sammons 2D for n_cell set to 450 onto a plane.

# Assuming your sammons_data is a dataframe with columns "x" and "y"
ggplot(centroid_coordinates, aes(x_coord, y_coord)) +
  geom_point(color = "blue") +
  labs(x = "X", y = "Y")

Figure 18: Sammons 2D Plot for 450 cells

7.3 Step 3: Tessellation

For more detailed information on voronoi tessellation please refer to section 4 of this vignette.

Now, we have obtained the centroid coordinates resulting from the application of Sammon’s projection.

For better visualisation, let’s plot the Voronoi tessellation using the plotHVT function.

# Voronoi tessellation plot for level one

plotHVT(hvt.results,
        line.width = c(0.2), 
        color.vec = c("#141B41"),
        centroid.size = 0.01,  #1.5
        maxDepth = 1,
        heatmap = '2Dhvt')

Figure 19: The Voronoi Tessellation for layer 1 shown for the 450 cells in the dataset ’computers’

Heat Maps

Now let’s plot the Voronoi Tessellation with the heatmap overlaid for all the features in the computers dataset for better visualization.

The heatmaps displayed below provides a visual representation of the spatial characteristics of the computers data, allowing us to observe patterns and trends in the distribution of each of the features (n,price,speed,hd,ram,screen,ads). The sheer green shades highlight regions with higher values in each of the heatmaps, while the indigo shades indicate areas with the lowest values in each of the heatmaps. By analyzing these heatmaps, we can gain insights into the variations and relationships between each of these features within the computers data

plotHVT(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "n",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = TRUE,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 20: The Voronoi Tessellation with the heat map overlaid over the No. of entities in each cell in the ’computers’ dataset

plotHVT(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "price",
  line.width = c(0.2),
  palette.color = 6,
  color.vec = c("#141B41"),
  centroid.size = 0.01,
  show.points = TRUE,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 21: The Voronoi Tessellation with the heat map overlaid over the variable price in the ’computers’ dataset

plotHVT(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "hd",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = TRUE,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 22: The Voronoi Tessellation with the heat map overlaid over the variable hd in the ’computers’ dataset

plotHVT(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "ram",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = TRUE,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 23: The Voronoi Tessellation with the heat map overlaid over the variable ram in the ’computers’ dataset

plotHVT(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "screen",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = TRUE,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 24: The Voronoi Tessellation with the heat map overlaid over the variable screen in the ’computers’ dataset

plotHVT(
  hvt.results,
  trainComputers,
  child.level = 1,
  hmap.cols = "ads",
  line.width = c(0.2),
  color.vec = c("#141B41"),
  palette.color = 6,
  centroid.size = 0.01,
  show.points = TRUE,
  quant.error.hmap = 0.2,
  n_cells.hmap = 15,
  heatmap = '2Dheatmap'
)

Figure 25: The Voronoi Tessellation with the heat map overlaid over the variable ads in the ’computers’ dataset

7.4 Step 4: Scoring(scoreHVT)

For more detailed information on scoring please refer to section 5 of this vignette.

Testing Dataset

Now, lets have a look at the randomly selected testing dataset containing (1237 data points) before we pass it to scoreHVT function for scoring. For the sake of brevity we are displaying first six rows.

Table(head(testComputers))
price speed hd ram screen ads trend
1595 25 170 4 15 94 1
1849 25 170 8 14 94 1
2575 50 210 4 15 94 1
2195 33 170 8 15 94 1
2295 25 245 8 14 94 1
2699 50 212 8 14 94 1

Now once we have built the model, let us try to score using our test dataset containing(1237 data points) which cell and which level each point belongs to.

scoreHVT(data,
         hvt.results.model,
         child.level,
         mad.threshold,
         line.width,
         color.vec,
         normalize,
         seed,
         distance_metric,
         error_metric,
         yVar)

The important parameters for the function scoreHVT are as below:

set.seed(240)
scoring_comp <-scoreHVT(
  testComputers,
  hvt.results,
  child.level = 1,
  line.width = c(1.2),
  color.vec = c("#141B41"),
  normalize = TRUE
)

Let’s see which cell and level each point belongs to and check the mean absolute difference of each of the 1237 records. For the sake of brevity, we will only show the first 100 rows.

Act_pred_Table <- scoring_comp[["actual_predictedTable"]]
rownames(Act_pred_Table) <- NULL
Act_pred_Table %>% head(100) %>%as.data.frame() %>%Table(scroll = TRUE, limit = 100)
Row.No act_price act_speed act_hd act_ram act_screen act_ads act_trend Cell.ID pred_price pred_speed pred_hd pred_ram pred_screen pred_ads pred_trend diff
1 -1.0724 -1.2739 -0.9432 -0.7572 0.4470 -1.7123 -1.8937 9 -0.5945390 -0.9721862 -0.8206754 -0.7572441 0.4469548 -1.6720532 -1.715425 0.1601015
2 -0.6389 -1.2739 -0.9432 -0.0529 -0.6645 -1.7123 -1.8937 32 -0.2067297 -0.9438971 -0.7481777 -0.0529038 -0.6644773 -1.6527843 -1.718608 0.1702614
3 0.6000 -0.0952 -0.7900 -0.7572 0.4470 -1.7123 -1.8937 227 0.6450858 -0.0952219 -0.6866862 -0.3346399 0.4469548 -1.6103927 -1.639024 0.1325157
4 -0.0484 -0.8967 -0.9432 -0.0529 0.4470 -1.7123 -1.8937 112 0.3628254 -0.8967484 -0.7114692 -0.1641155 0.4469548 -1.6396003 -1.685937 0.1478183
5 0.1222 -1.2739 -0.6561 -0.0529 -0.6645 -1.7123 -1.8937 32 -0.2067297 -0.9438971 -0.7481777 -0.0529038 -0.6644773 -1.6527843 -1.718608 0.1408064
6 0.8116 -0.0952 -0.7824 -0.0529 -0.6645 -1.7123 -1.8937 141 0.6341640 -0.0952219 -0.7528605 -0.1535239 -0.6644773 -1.6500316 -1.711787 0.0788323
7 -0.2191 -0.8967 -0.6369 -0.7572 0.4470 -1.7123 -1.8937 9 -0.5945390 -0.9721862 -0.8206754 -0.7572441 0.4469548 -1.6720532 -1.715425 0.1219017
8 0.9755 0.6592 -1.0963 -0.7572 -0.6645 -1.7123 -1.8937 27 -0.0009503 0.6591560 -1.0633447 -0.7768091 -0.6644773 -1.6251376 -1.674395 0.1907927
9 1.1120 -0.0952 -0.7900 -0.7572 2.6698 -1.7123 -1.8937 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.2716746
10 3.0233 -0.8967 0.1364 -0.0529 -0.6645 -1.7123 -1.8937 350 2.7022675 -0.3957944 0.0780092 -0.0529038 -0.6644773 -1.5681686 -1.654941 0.1804637
11 1.4874 -0.8967 -0.2924 1.3558 -0.6645 -1.7123 -1.8937 291 0.9603741 -1.0128066 -0.2923752 1.3557766 -0.6644773 -1.6050309 -1.639024 0.1435926
12 0.6342 -0.8967 -0.7900 -0.0529 2.6698 -1.7123 -1.8937 336 0.8063686 -0.7510163 -0.6508398 -0.1809657 2.6698188 -1.2077714 -1.488538 0.2135411
13 0.3099 -0.0952 -0.7900 -0.0529 -0.6645 -1.7123 -1.8937 141 0.6341640 -0.0952219 -0.7528605 -0.1535239 -0.6644773 -1.6500316 -1.711787 0.1008934
14 0.3441 -0.0952 -0.7900 -0.0529 -0.6645 -1.7123 -1.8937 141 0.6341640 -0.0952219 -0.7528605 -0.1535239 -0.6644773 -1.6500316 -1.711787 0.0960077
15 0.4703 -0.8967 -0.7824 -0.0529 -0.6645 -1.7123 -1.8937 62 0.3724957 -0.8967484 -0.6991919 -0.0529038 -0.6644773 -1.6246907 -1.664491 0.0711294
16 0.8048 -0.8967 -0.6561 -0.0529 -0.6645 -1.7123 -1.8937 62 0.3724957 -0.8967484 -0.6991919 -0.0529038 -0.6644773 -1.6246907 -1.664491 0.1131842
17 -0.8096 -1.2739 -1.1346 -0.7572 -0.6645 -1.7123 -1.8937 4 -0.7309668 -1.1062979 -1.0261255 -0.7572441 -0.6644773 -1.6541806 -1.716840 0.0842509
18 1.3168 0.6592 -0.2924 -0.0529 0.4470 -1.7123 -1.8937 317 1.5829958 0.6591560 -0.4935330 -0.3730585 0.4469548 -1.6367141 -1.696903 0.1514227
19 -0.7311 -0.8967 -0.9432 -0.7572 0.4470 -1.7123 -1.8937 9 -0.5945390 -0.9721862 -0.8206754 -0.7572441 0.4469548 -1.6720532 -1.715425 0.0790262
20 -0.9017 -0.8967 -0.9432 -0.7572 -0.6645 -1.7123 -1.8937 4 -0.7309668 -1.1062979 -1.0261255 -0.7572441 -0.6644773 -1.6541806 -1.716840 0.0997576
21 1.4004 -0.0952 -0.6561 -0.0529 -0.6645 -1.7123 -1.8937 141 0.6341640 -0.0952219 -0.7528605 -0.1535239 -0.6644773 -1.6500316 -1.711787 0.1725496
22 -1.0143 -1.2739 -1.2877 -0.0529 -0.6645 -1.7123 -1.8937 1 -1.3513084 -1.2396475 -1.1711355 -0.9493368 -0.6644773 -1.6610859 -1.754783 0.2249167
23 0.6342 -1.2739 -0.6561 -0.0529 -0.6645 -1.7123 -1.8937 62 0.3724957 -0.8967484 -0.6991919 -0.0529038 -0.6644773 -1.6246907 -1.664491 0.1426847
24 -0.3829 -0.8967 -1.1346 -0.0529 -0.6645 -1.7123 -1.8937 32 -0.2067297 -0.9438971 -0.7481777 -0.0529038 -0.6644773 -1.6527843 -1.718608 0.1206320
25 0.3355 -0.8967 -0.9432 -0.0529 0.4470 -1.7123 -1.8937 112 0.3628254 -0.8967484 -0.7114692 -0.1641155 0.4469548 -1.6396003 -1.685937 0.0929755
26 0.6342 -1.2739 -0.2924 1.3558 -0.6645 -1.6989 -1.7664 291 0.9603741 -1.0128066 -0.2923752 1.3557766 -0.6644773 -1.6050309 -1.639024 0.1155119
27 -0.9017 -0.8967 -0.9432 -0.7572 -0.6645 -1.6989 -1.7664 4 -0.7309668 -1.1062979 -1.0261255 -0.7572441 -0.6644773 -1.6541806 -1.716840 0.0796576
28 1.3236 0.6592 -0.3307 -0.7572 0.4470 -1.6989 -1.7664 317 1.5829958 0.6591560 -0.4935330 -0.3730585 0.4469548 -1.6367141 -1.696903 0.1340203
29 1.3168 0.6592 -0.6369 -0.0529 2.6698 -1.6989 -1.7664 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1560027
30 -0.1338 0.6592 -0.9432 -0.7572 -0.6645 -1.6989 -1.7664 27 -0.0009503 0.6591560 -1.0633447 -0.7768091 -0.6644773 -1.6251376 -1.674395 0.0626340
31 1.3168 0.6592 -0.2924 -0.0529 0.4470 -1.6989 -1.7664 317 1.5829958 0.6591560 -0.4935330 -0.3730585 0.4469548 -1.6367141 -1.696903 0.1313227
32 -1.2362 -1.2739 -0.9432 -0.7572 -0.6645 -1.6989 -1.7664 1 -1.3513084 -1.2396475 -1.1711355 -0.9493368 -0.6644773 -1.6610859 -1.754783 0.0884125
33 0.1734 0.6592 -0.6369 -0.0529 -0.6645 -1.6989 -1.7664 214 0.5488374 0.6591560 -0.7107818 -0.0529038 -0.6644773 -1.6192238 -1.654005 0.0916373
34 -0.0058 -0.8967 -0.6369 -0.7572 -0.6645 -1.6989 -1.7664 15 -0.1387528 -0.8967484 -0.7498526 -0.7572441 -0.6644773 -1.6821065 -1.776970 0.0390548
35 0.6000 0.6592 -0.6369 -0.0529 0.4470 -1.6989 -1.7664 269 0.6654504 0.6591560 -0.6624406 -0.1702939 0.4469548 -1.5044974 -1.575357 0.0848457
36 -0.0484 -0.8967 -0.9432 -0.0529 0.4470 -1.6989 -1.7664 112 0.3628254 -0.8967484 -0.7114692 -0.1641155 0.4469548 -1.6396003 -1.685937 0.1277183
37 1.4362 -0.0952 -0.6369 -0.7572 2.6698 -1.6989 -1.7664 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.2366132
38 -0.3898 -0.8967 -1.0963 -0.7572 -0.6645 -1.6989 -1.7664 12 -0.3851096 -0.8967484 -1.0133646 -0.7572441 -0.6644773 -1.6184353 -1.660246 0.0391942
39 -1.0724 -0.8967 -1.2686 -1.1094 -0.6645 -1.6989 -1.7664 2 -1.0176168 -0.6963668 -1.2685821 -1.1094142 -0.6644773 -1.6027968 -1.639024 0.0683786
40 2.6820 -0.8967 0.1364 -0.0529 -0.6645 -1.6989 -1.7664 350 2.7022675 -0.3957944 0.0780092 -0.0529038 -0.6644773 -1.5681686 -1.654941 0.1173973
41 0.8901 0.6592 -0.6369 -0.7572 0.4470 -1.6989 -1.7664 269 0.6654504 0.6591560 -0.6624406 -0.1702939 0.4469548 -1.5044974 -1.575357 0.1746616
42 0.2997 -0.0952 -0.7824 -0.7572 -0.6645 -1.6989 -1.7664 21 -0.1794639 -0.0952219 -0.9934693 -0.7732518 -0.6644773 -1.6446350 -1.708479 0.1169308
43 0.7195 -0.8967 -0.6369 -0.0529 2.6698 -1.6989 -1.7664 336 0.8063686 -0.7510163 -0.6508398 -0.1809657 2.6698188 -1.2077714 -1.488538 0.1633667
44 2.6820 0.6592 0.3201 -0.0529 -0.6645 -1.6989 -1.7664 381 2.9542599 0.6591560 0.1634825 -0.0529038 -0.4079929 -1.6751465 -1.766358 0.1013182
45 1.3236 0.6592 -0.6561 1.3558 0.4470 -1.6318 -1.6390 361 1.3194314 0.4076967 -0.3970144 1.3557766 0.4469548 -1.0852357 -1.398504 0.1859838
46 -0.5536 -0.8967 -0.9432 -0.7572 -0.6645 -1.6318 -1.6390 12 -0.3851096 -0.8967484 -1.0133646 -0.7572441 -0.6644773 -1.6184353 -1.660246 0.0390545
47 0.4703 -0.0952 -0.7900 -0.7572 -0.6645 -1.6318 -1.6390 141 0.6341640 -0.0952219 -0.7528605 -0.1535239 -0.6644773 -1.6500316 -1.711787 0.1279632
48 -1.2362 -1.2739 -0.9432 -0.7572 -0.6645 -1.6318 -1.6390 3 -1.2394185 -1.2739374 -0.9967755 -0.8355041 -0.6644773 -1.2937495 -1.440948 0.0958943
49 1.1291 0.6592 -0.9432 -0.0529 0.4470 -1.6318 -1.6390 269 0.6654504 0.6591560 -0.6624406 -0.1702939 0.4469548 -1.5044974 -1.575357 0.1504054
50 2.8527 0.6592 0.3201 -0.0529 0.4470 -1.6318 -1.6390 381 2.9542599 0.6591560 0.1634825 -0.0529038 -0.4079929 -1.6751465 -1.766358 0.1834176
51 1.5557 0.6592 -0.6369 -0.0529 2.6698 -1.6318 -1.6390 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1623455
52 -0.0416 -0.0952 -0.7786 -0.0529 -0.6645 -1.6318 -1.6390 138 0.1135166 -0.0952219 -0.6637167 -0.0529038 -0.6644773 -1.5331342 -1.615872 0.0559774
53 -0.5177 -0.8967 -0.9432 -0.7572 -0.6645 -1.6318 -1.6390 12 -0.3851096 -0.8967484 -1.0133646 -0.7572441 -0.6644773 -1.6184353 -1.660246 0.0339259
54 1.1461 -0.0952 -0.2924 1.3558 -0.6645 -1.6318 -1.6390 291 0.9603741 -1.0128066 -0.2923752 1.3557766 -0.6644773 -1.6050309 -1.639024 0.1614567
55 1.1803 -0.0952 -0.6369 -0.0529 2.6698 -1.6318 -1.6390 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1635690
56 -0.0484 -1.2739 -0.6561 -0.0529 -0.6645 -1.6318 -1.6390 32 -0.2067297 -0.9438971 -0.7481777 -0.0529038 -0.6644773 -1.6527843 -1.718608 0.0972899
57 -1.0724 -1.2739 -0.9432 -0.7572 0.4470 -1.6318 -1.6390 9 -0.5945390 -0.9721862 -0.8206754 -0.7572441 0.4469548 -1.6720532 -1.715425 0.1455523
58 -1.2430 -1.2739 -0.9432 -0.7572 -0.6645 -1.6318 -1.6390 3 -1.2394185 -1.2739374 -0.9967755 -0.8355041 -0.6644773 -1.2937495 -1.440948 0.0959462
59 1.8356 0.6592 -0.6561 1.3558 0.4470 -1.6318 -1.6390 361 1.3194314 0.4076967 -0.3970144 1.3557766 0.4469548 -1.0852357 -1.398504 0.2591267
60 -0.9017 -0.0952 -1.2686 -1.1094 -0.6645 -1.6318 -1.6390 2 -1.0176168 -0.6963668 -1.2685821 -1.1094142 -0.6644773 -1.6027968 -1.639024 0.1065951
61 1.3236 0.6592 -0.9432 -0.7572 -0.6645 -1.6318 -1.6390 153 1.4399884 0.3574048 -0.8283320 -0.4755080 -0.6644773 -1.6720532 -1.791825 0.1439778
62 1.3850 -0.0952 -0.6369 -0.0529 2.6698 -1.6318 -1.6390 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1628200
63 0.5488 0.6592 -0.6369 -0.0529 0.4470 -1.6318 -1.6390 269 0.6654504 0.6591560 -0.6624406 -0.1702939 0.4469548 -1.5044974 -1.575357 0.0643743
64 0.6342 0.6592 -0.6561 -0.0529 -0.6645 -1.6318 -1.6390 214 0.5488374 0.6591560 -0.7107818 -0.0529038 -0.6644773 -1.6192238 -1.654005 0.0239565
65 0.4294 -0.0952 -0.6369 -0.0529 -0.6645 -1.6318 -1.6390 138 0.1135166 -0.0952219 -0.6637167 -0.0529038 -0.6644773 -1.5331342 -1.615872 0.0663631
66 -0.3983 -1.2739 -1.0963 -0.7572 -0.6645 -1.5246 -1.5117 12 -0.3851096 -0.8967484 -1.0133646 -0.7572441 -0.6644773 -1.6184353 -1.660246 0.1022466
67 -0.3829 -0.8967 -0.7786 -0.0529 -0.6645 -1.5246 -1.5117 32 -0.2067297 -0.9438971 -0.7481777 -0.0529038 -0.6644773 -1.6527843 -1.718608 0.0841298
68 1.1461 -0.0952 -0.2924 1.3558 -0.6645 -1.5246 -1.5117 352 1.4390782 0.3071130 -0.0922847 1.3557766 -0.6644773 -1.2538340 -1.511690 0.1666041
69 0.8048 0.6592 -0.6369 -0.0529 -0.6645 -1.5246 -1.5117 214 0.5488374 0.6591560 -0.7107818 -0.0529038 -0.6644773 -1.6192238 -1.654005 0.0809776
70 -0.3898 -0.0952 -0.9432 -0.7572 -0.6645 -1.5246 -1.5117 21 -0.1794639 -0.0952219 -0.9934693 -0.7732518 -0.6644773 -1.6446350 -1.708479 0.0847880
71 0.6410 -0.8967 -0.6561 1.3558 0.4470 -1.5246 -1.5117 313 0.8393271 -0.9805682 -0.0852237 1.3557766 0.4469548 -0.9467230 -1.356059 0.2266655
72 -0.2123 -0.8967 -0.7824 -0.7572 -0.6645 -1.5246 -1.5117 15 -0.1387528 -0.8967484 -0.7498526 -0.7572441 -0.6644773 -1.6821065 -1.776970 0.0755694
73 -0.0484 -0.0952 -1.0963 -0.7572 -0.6645 -1.5246 -1.5117 21 -0.1794639 -0.0952219 -0.9934693 -0.7732518 -0.6644773 -1.6446350 -1.708479 0.0809722
74 -0.0570 -0.8967 -1.0963 -0.7572 -0.6645 -1.5246 -1.5117 12 -0.3851096 -0.8967484 -1.0133646 -0.7572441 -0.6644773 -1.6184353 -1.660246 0.0933631
75 -0.8949 -0.8967 -1.1346 -0.7572 -0.6645 -1.5246 -1.5117 4 -0.7309668 -1.1062979 -1.0261255 -0.7572441 -0.6644773 -1.6541806 -1.716840 0.1166846
76 0.3099 -0.0952 -0.6369 -0.0529 -0.6645 -1.5246 -1.5117 138 0.1135166 -0.0952219 -0.6637167 -0.0529038 -0.6644773 -1.5331342 -1.615872 0.0479936
77 0.5147 -0.0952 -0.6369 -0.0529 -0.6645 -1.5246 -1.5117 138 0.1135166 -0.0952219 -0.6637167 -0.0529038 -0.6644773 -1.5331342 -1.615872 0.0772507
78 1.5386 0.6592 -0.6369 -0.0529 2.6698 -1.5246 -1.5117 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1459897
79 0.1222 -0.8967 -0.6561 -0.0529 -0.6645 -1.5246 -1.5117 62 0.3724957 -0.8967484 -0.6991919 -0.0529038 -0.6644773 -1.6246907 -1.664491 0.0780492
80 1.4533 -0.0952 -0.6369 -0.0529 2.6698 -1.5246 -1.5117 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1586641
81 0.1137 -1.2739 -0.6561 -0.0529 -0.6645 -1.5246 -1.5117 62 0.3724957 -0.8967484 -0.6991919 -0.0529038 -0.6644773 -1.6246907 -1.664491 0.1331354
82 -0.5536 -0.0952 -1.1346 -0.7572 -0.6645 -1.5246 -1.5117 21 -0.1794639 -0.0952219 -0.9934693 -0.7732518 -0.6644773 -1.6446350 -1.708479 0.1211682
83 0.2997 -0.0952 -0.3689 -0.0529 -0.6645 -1.1091 -1.3844 222 0.6527225 -0.0952219 -0.1655640 -0.0529038 -0.6644773 -0.8610834 -1.320688 0.1240193
84 0.4550 -0.8967 -0.2924 -0.0529 -0.6645 -1.1091 -1.3844 156 0.3059410 -0.9596133 -0.3306578 -0.0529038 -0.6644773 -0.8610834 -1.320688 0.0802836
85 0.6256 -0.8967 -0.2924 -0.0529 0.4470 -1.1091 -1.3844 188 0.2523275 -0.9281809 -0.5220710 -0.0529038 0.4469548 -0.9024138 -1.331299 0.1277515
86 0.9755 -0.8967 0.1364 1.3558 -0.6645 -1.1091 -1.3844 280 0.8156769 -1.0339081 0.1363902 1.3557766 -0.6644773 -0.8385395 -1.314900 0.0910210
87 1.4789 0.6592 0.1364 1.3558 -0.6645 -1.1091 -1.3844 352 1.4390782 0.3071130 -0.0922847 1.3557766 -0.6644773 -1.2538340 -1.511690 0.1275233
88 0.2843 -1.2739 -0.2924 -0.0529 -0.6645 -1.1091 -1.3844 156 0.3059410 -0.9596133 -0.3306578 -0.0529038 -0.6644773 -0.8610834 -1.320688 0.0979915
89 -0.9102 -0.8967 -1.1844 -1.1094 0.4470 -1.1091 -1.3844 8 -1.2003579 -1.0584009 -1.1646722 -1.0087941 0.4469548 -0.7548052 -1.293402 0.1453615
90 -0.3898 -0.8967 -0.7747 -0.7572 -0.6645 -1.1091 -1.3844 40 -0.4747405 -0.9721862 -0.8788650 -0.7572441 -0.6644773 -1.1090659 -1.384355 0.0378196
91 0.6512 -0.8967 -0.6369 -0.0529 2.6698 -1.1091 -1.3844 336 0.8063686 -0.7510163 -0.6508398 -0.1809657 2.6698188 -1.2077714 -1.488538 0.0922409
92 0.6854 0.6592 -0.6369 -0.0529 0.4470 -1.1091 -1.3844 269 0.6654504 0.6591560 -0.6624406 -0.1702939 0.4469548 -1.5044974 -1.575357 0.1070468
93 0.2587 -0.0952 -0.6369 -0.0529 0.4470 -1.1091 -1.3844 240 0.6768273 -0.0952219 -0.5794949 -0.0529038 0.4469548 -1.1090659 -1.384355 0.0679546
94 0.2843 0.6592 -0.7747 -0.7572 0.4470 -1.1091 -1.3844 269 0.6654504 0.6591560 -0.6624406 -0.1702939 0.4469548 -1.5044974 -1.575357 0.2381085
95 1.1632 0.6592 -0.6369 -0.0529 2.6698 -1.1091 -1.3844 375 1.2852716 0.3690107 -0.6722566 -0.2696239 2.6698188 -1.3668439 -1.580254 0.1597084
96 0.3952 -0.0952 -0.6369 -0.0529 0.4470 -1.1091 -1.3844 240 0.6768273 -0.0952219 -0.5794949 -0.0529038 0.4469548 -1.1090659 -1.384355 0.0484546
97 2.3407 -0.0952 0.1364 -0.0529 -0.6645 -1.1091 -1.3844 350 2.7022675 -0.3957944 0.0780092 -0.0529038 -0.6644773 -1.5681686 -1.654941 0.2071698
98 1.3082 -0.0952 0.1364 1.3558 0.4470 -1.1091 -1.3844 361 1.3194314 0.4076967 -0.3970144 1.3557766 0.4469548 -1.0852357 -1.398504 0.1550827
99 -0.0570 -0.8967 -0.7747 -0.7572 -0.6645 -1.1091 -1.3844 40 -0.4747405 -0.9721862 -0.8788650 -0.7572441 -0.6644773 -1.1090659 -1.384355 0.0853625
100 -0.1338 -0.0952 -0.9432 -0.7572 -0.6645 -1.1091 -1.3844 21 -0.1794639 -0.0952219 -0.9934693 -0.7732518 -0.6644773 -1.6446350 -1.708479 0.1388063
hist(Act_pred_Table$diff, breaks = 20, col = "blue", main = "Mean Absolute Difference", xlab = "Difference",xlim = c(0,0.6), ylim = c(0,250))

Figure 26: Mean Absolute Difference

8. Executive Summary

9. Applications

  1. Pricing Segmentation - The package can be used to discover groups of similar customers based on the customer spend pattern and understand price sensitivity of customers

  2. Market Segmentation - The package can be helpful in market segmentation where we have to identify micro and macro segments. The method used in this package can do both kinds of segmentation in one go

  3. Anomaly Detection - This method can help us categorize system behavior over time and help us find anomaly when there are changes in the system. For e.g. Finding fraudulent claims in healthcare insurance

  4. The package can help us understand the underlying structure of the data. Suppose we want to analyze a curved surface such as sphere or vase, we can approximate it by a lot of small low-order polygons in the form of tessellations using this package

  5. In biology, Voronoi diagrams are used to model a number of different biological structures, including cells and bone microarchitecture

  6. Using the base idea of Systems Dynamics, these diagrams can also be used to depict customer state changes over a period of time

10. References

  1. Topology Preserving Maps : https://users.ics.aalto.fi/jhollmen/dippa/node9.html

  2. Vector Quantization : https://en.wikipedia.org/wiki/Vector_quantization

  3. K-means : https://en.wikipedia.org/wiki/K-means_clustering

  4. Sammon’s Projection : https://en.wikipedia.org/wiki/Sammon_mapping

  5. Voronoi Tessellations : https://en.wikipedia.org/wiki/Centroidal_Voronoi_tessellation

  6. Embedding : https://en.wikipedia.org/wiki/Embedding